Attention Rank loss during training

urnotLeo commented 3 years ago

Thank you for your previous work! I'm trying to apply this idea to one project. But now I have some problems.

May I ask whether the value of attention rank loss is very high during your training？ My rerank task had 30 docus per query, which resulted in a loss value close to 70. So the learning rate needs to be very small for the network to converge. I don't know if that's right.

And in the best case, when the perdicted attention strategy is equal to the best attention strategy, the attention rank loss value is not 0. Will this affect the process of network back propagation? Is there any problem in the gradient adjustment direction?

Hoping you can solve my confusion. Thank you soooo much!!!

QingyaoAi commented 3 years ago

Thanks for asking! The loss (I assume that you are talking about perplexity, i.e., exp(loss)) could be very large in the beginning (e.g., 8000+ for me), but shouldn't be large when the algorithm converges (e.g., 10+ on the Yahoo! dataset).

Yes, due to some tricks we added to prevent numerical problems (e.g., log(0)), the loss won't be 0 even when the predicted attention is the best attention strategy, but it should be very small.

urnotLeo commented 3 years ago

OK. Got it! Thanks! And one more question: Due to the loss is so big, does the learning rate need to be small?

QingyaoAi commented 3 years ago

Yes, it's better to start with a small learning rate and increase it when necessary. Large learning rates could be fine since we cap the gradient norm, but it's not preferred.

urnotLeo commented 3 years ago

All right! Thank you very much for your answer. Have a nice day~

QingyaoAi / Deep-Listwise-Context-Model-for-Ranking-Refinement

Attention Rank loss during training #7