Open nbstrong opened 5 years ago
Yes, I'm aware, I commented on the thread as well. The implementation is technically correct, it follows the loss formulation from the papers. But if we look at gradients it can indeed be problematic and suboptimal. Even if in many cases this formulation seems to work in practice, users should be aware of potential issues - I'll add a clarification and loss alternatives.
Hi,
From what I understood from the Twitter discussion, power of ½ will create a stronger push or gradient against negatives when they are close. Is that correct?
Moreover, what's the point of the margin when, from what I understand, it is zero out in the gradient calculation?
Thanks
https://twitter.com/alfcnz/status/1133372277876068352
There's some discussion going on in her replies as well, but if there is an issue it should be addressed here.