clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.06k stars 273 forks source link

Comparing to your previous paper(VoxCeleb2) #32

Closed jlian2 closed 3 years ago

jlian2 commented 4 years ago

In your previous VoxCeleb2 paper, the best EER on VoxCeleb1 is 2.87%, whose model applies NetVLAD/GhostVLAD. But here you apply SAP. Currently ignoring Angular Prototypical loss, you also get a good result (2.36) using CosFace/ArcFace, which is better than 2.87% too. What do you think the factor that leads to this better result? SAP or training for 500 epochs ? The title of the paper is "In denfense of metric learning", I guess you want to assert more on the best result given by Angular Prototypical loss, but the thing is Cosface/ArcFace also achieves close result(2.36%).

jlian2 commented 4 years ago

Plus, did you tried the model without using SAP?

joonson commented 3 years ago

Our updated tech report provides more analysis. We tried ASP as well here.