clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

Is the thinResNet34 with angular-prototypical the state-of-the-art neural network you know? #54

Closed zeek-han closed 3 years ago

zeek-han commented 4 years ago

Thank you for this open source and your paper "In defence of metric learning for speaker recognition".

In this paper, your "proposed metric learning objective outperform state-of-the-art methods." The best network in the paper is thin ResNet34 with angular-prototypical loss. Is the best network you know? Is there any update?

If my question is not rude, please teach me which network is the best you know... Thank you.

joonson commented 4 years ago

This paper has better results than ours, but also using a much larger network. We have some minor updates (e.g. change of input dimensions, etc.) that makes small improvement to performance -- we will release this soon.

zeek-han commented 4 years ago

Thank you for your answer, but this paper you mentioned above(Jhu-HLTCOE System for the Voxsrc Speaker Recognition Challenge) was 2nd ranked in VOXSRC 2019. In Voxsrc2019, BUT system showed its superiority to Jhu-HLTCOE System, doesn't it? BUT system was the winner of VOXSRC 2019.

I am confused with the State of the Art....please let me know....thank you...

BUT system: https://arxiv.org/pdf/1910.12592.pdf

joonson commented 4 years ago

I know that there are several works that has lower EER than our baseline, but don't have an exhaustive list of them. For example, this paper claims an EER as low as 0.5% but we cannot validate the claim.

zeek-han commented 4 years ago

Thank you very much!! the paper which claims to achieve EER 0.55% made me surprised! Actually I cannot accept the claim even though the paper in on InterSpeech...but it was so impressive, thank you.