clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.01k stars 272 forks source link

Shouldn't Prototypical loss use torch.cdist instead of F.pairwise_distance? #169

Open theolepage opened 1 year ago

theolepage commented 1 year ago

Hello,

I have a question regarding the following part used for the Prototypical loss computation. https://github.com/clovaai/voxceleb_trainer/blob/343af8bc9b325a05bcf28772431b83f5b8817f5a/loss/proto.py#L31

From my understanding, output should be similar to the cosine similarity matrix used for the Angular Prototypical loss but based on Euclidean distances instead.

Thus, the output tensor should have a shape of $(N, N)$ (with $N$ the number of samples in the mini-batch) and values at $i, j$ should be the squared Euclidean distance between sample $i$ of out_positive and sample $j$ of out_anchor.

However, F.pairwise_distance computes the pairwise distance between out_positive and out_anchor and not the distance between each pair between two sets of row vectors like torch.cdist.

Visualization of the difference between F.pairwise_distance and F.cosine_similarity ![diff_pairwise_distance_and_cosine_similarity](https://user-images.githubusercontent.com/2933110/214818313-a7c61ce7-0e53-4087-9144-39dcdb9076e5.png)

As a result the output shape will be $(N, D)$ (with $D$ the output dimension of the model) and the following loss computation is not coherent.

Thanks.

AlexGranger-scn commented 1 year ago

Same confusion..... Have you solved this problem now? Thanks for your reply!

deGennesMarc commented 4 months ago

Same issue. In the spirit of @theolepage 's suggestion, I replaced the line with : output = -torch.cdist(out_positive, out_anchor, p=2).pow(2) but as of today it does not work for me.

Also it seems to me the definition of the prototypical loss from the "In defence of metric learning" paper is wrong as there should be a minus sign in front of the distances S_{j,k} in the softmax.