clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

Pretrained models for AM/AMS-Softmax #17

Closed mussakhojayeva closed 4 years ago

mussakhojayeva commented 4 years ago

Hello. Thank you for sharing the code! I have my own test set for speaker verification (there are 2 different test sets, I tried both, the problem occurs in both). I am having some troubles with evaluating the models pre-trained on metric learning(AP) to my test sets - the problem is that EER is too high (22% and 30%). I used your pretrained model on fast ResNet & thin ResNet. However, when tested on the model from Utterance-level Aggregation For Speaker Recognition In The Wild (Thin Resnet34) with classification objective (Softmax), the EER for my data is significantly lower (10 and 8%). The format I am passing is the same (label(0/1), wav1, wav2). I was wondering if you could share the models pre-trained on classification objectives (AM/AMS-Softmax) publicly, so that I can evaluate my data on them.

009deep commented 4 years ago

I can confirm on this as well. Even though provided pre-trained model or model trained with similar mechanism from scratch has less EER on voxceleb test sets, it's performances on 3 of my non public datasets is less superior than paper you mentioned.

One thing to highlight though is it uses different network architecture as compared to other paper. I was also thinking whether it'd change if different loss is used. I am training with classification loss to verify whether it makes any difference. I'll share my observation.

joonson commented 4 years ago

This is a ResNetSE34L model trained with AMSoft m=0.2, s=30. Give it a try. I have not compared the performance of our models on other datasets.

V7907B.model.zip

009deep commented 4 years ago

Apparently, this model has less error on my datasets as compared to angleproto one.

joonson commented 4 years ago

What is your dataset? It is very possible that other methods work better on different datasets. Is there a large difference in performance?

009deep commented 4 years ago

I have 3 of them and all of them are private to org. Yes, difference is noticeable. If I put in terms of number, it varies by 2-4% (Not 0.2-0.4%) . ie. AM softmax has 2-4% less error. Btw, this work is great, a great platform to explore different possibilities.