clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

Similarity or distance metric to calculate scores? #100

Closed Nada-gh closed 2 years ago

Nada-gh commented 3 years ago

Hello, I noticed you used the pairwise.distance function to calculate the similarity scores for utterances pairs. According to the official documentation of that function, it calculates the distance (which is different from similarity) between feature vectors. Could you please explain. Thank you.

zabir-nabil commented 2 years ago

I think you can use a similarity function too, for example,

dist = F.cosine_similarity(ref_feat.unsqueeze(-1), com_feat.unsqueeze(-1).transpose(0,2)).detach().cpu().numpy()
score = numpy.mean(dist) # see, now, we don't multiply by -1 as it's a similarity based score not a distance

Cosine sometimes gives slightly better results.

Jungjee commented 2 years ago

Closing as this issue has been inactive over six months. When the speaker embeddings are normalised to have a length of 1, simple dot product gives their similarity, which can be thought as 1 - distance.