Twitter Issue: "triplet loss is flawed"

adambielski / siamese-triplet

Siamese and triplet networks with online pair/triplet mining in PyTorch

BSD 3-Clause "New" or "Revised" License

3.1k stars 633 forks source link

Twitter Issue: "triplet loss is flawed" #31

Open nbstrong opened 5 years ago

nbstrong commented 5 years ago

https://twitter.com/alfcnz/status/1133372277876068352

Unfortunately that triplet loss is flawed. The most offending negative sample has zero gradient. That power of 2 should be a power of ½.
I feel bad so many people still use it. 😕 https://t.co/M3daSGzlMK
— Alfredo Canziani (@alfcnz) May 28, 2019

There's some discussion going on in her replies as well, but if there is an issue it should be addressed here.

adambielski commented 5 years ago

Yes, I'm aware, I commented on the thread as well. The implementation is technically correct, it follows the loss formulation from the papers. But if we look at gradients it can indeed be problematic and suboptimal. Even if in many cases this formulation seems to work in practice, users should be aware of potential issues - I'll add a clarification and loss alternatives.

jonkoi commented 5 years ago

Hi,

From what I understood from the Twitter discussion, power of ½ will create a stronger push or gradient against negatives when they are close. Is that correct?

Moreover, what's the point of the margin when, from what I understand, it is zero out in the gradient calculation?

Thanks