Closed bluenights closed 2 years ago
The hinge loss encourages the standard deviation to be at get close to \gamma but there is no force encouraging it to be much greater than \gamma. Indeed, the hinge is the max between 0 and \gamma - std so when std is greater than \gamma the loss is 0. In practice we observe that std will be slightly lower than \gamma without reaching exactly \gamma, which is enough to effectively maintain the variance of the embeddings and prevent the collapse.
I hope this answer your question, I am closing the issue since it is not related to the code. If you have additional questions fill free to contact me by e-mail at abardes@fb.com
Hi, thanks for the great work!
Would you please post a figure of the std_loss changes with respect to the training epochs? It would be very helpful. Because my std_loss decays very slowly, changed from 23 to 21.
Thanks!
Thanks for the great work! It is stated in the paper (Sec.4.1) that the hinge function encourages the variance to be equal to \gamma. I think it should be above \gamma instead. Is that correct? Will there be a situation where the variance becomes too large?