TransHP is a great work! I have a small question: when calculating the absorption weight of the target prompt using the self-attention mechanism, is the value calculated the average absorption weight of all feature tokens?
Thanks for this insightful question. After checking, it seems that is correct. Since it has been quite a while (about 2 years)
since I finished this paper, I am not pretty sure :)
TransHP is a great work! I have a small question: when calculating the absorption weight of the target prompt using the self-attention mechanism, is the value calculated the average absorption weight of all feature tokens?