Closed mussakhojayeva closed 4 years ago
I can confirm on this as well. Even though provided pre-trained model or model trained with similar mechanism from scratch has less EER on voxceleb test sets, it's performances on 3 of my non public datasets is less superior than paper you mentioned.
One thing to highlight though is it uses different network architecture as compared to other paper. I was also thinking whether it'd change if different loss is used. I am training with classification loss to verify whether it makes any difference. I'll share my observation.
This is a ResNetSE34L
model trained with AMSoft m=0.2, s=30. Give it a try. I have not compared the performance of our models on other datasets.
Apparently, this model has less error on my datasets as compared to angleproto one.
What is your dataset? It is very possible that other methods work better on different datasets. Is there a large difference in performance?
I have 3 of them and all of them are private to org. Yes, difference is noticeable. If I put in terms of number, it varies by 2-4% (Not 0.2-0.4%) . ie. AM softmax has 2-4% less error. Btw, this work is great, a great platform to explore different possibilities.
Hello. Thank you for sharing the code! I have my own test set for speaker verification (there are 2 different test sets, I tried both, the problem occurs in both). I am having some troubles with evaluating the models pre-trained on metric learning(AP) to my test sets - the problem is that EER is too high (22% and 30%). I used your pretrained model on fast ResNet & thin ResNet. However, when tested on the model from Utterance-level Aggregation For Speaker Recognition In The Wild (Thin Resnet34) with classification objective (Softmax), the EER for my data is significantly lower (10 and 8%). The format I am passing is the same (label(0/1), wav1, wav2). I was wondering if you could share the models pre-trained on classification objectives (AM/AMS-Softmax) publicly, so that I can evaluate my data on them.