clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

Training difference between 2 work #89

Closed 009deep closed 3 years ago

009deep commented 3 years ago

@joonson In paper1 you have the minimum EER of 0.88 vs 1.18 in paper2.

What is main difference in training there? It looks same per table in term of architecture and loss. Also, I think weight on github is from paper2. Could you share weight from paper1?

joonson commented 3 years ago

It was made possible by improvements from a much larger batch size from FP16 training. I also think that (without evidence) the 0.88 one had a 'lucky' random seed.

009deep commented 3 years ago

Ah, I see. So is that separate codebase with use of autocast? Could you provide sample of how to properly use autocast with training here?

joonson commented 3 years ago

It's merged into master now. You need to use the --mixedprec flag.

009deep commented 3 years ago

Ah, I see. I haven't used that merge yet, I'll try out. Could you provide new weights and setting to verify result?

joonson commented 3 years ago

Here is the model that produces 0.88% EER on Vox1 cleaned.

@009deep Also pls note that 2010.15809 tests using the cleaned version of Vox1 and 2009.14153 tests using the original version of Vox1. This will make a difference of at least 0.1% EER.

llearner commented 3 years ago

Here is the model that produces 0.88% EER on Vox1 cleaned.

@009deep Also pls note that 2010.15809 tests using the cleaned version of Vox1 and 2009.14153 tests using the original version of Vox1. This will make a difference of at least 0.1% EER.

thanks very much!

009deep commented 3 years ago

Thank you @joonson .

While using ResNetSE34V2 , I get following warning messages: S.bn_last.weight is not in the model. S.bn_last.bias is not in the model. S.bn_last.running_mean is not in the model. S.bn_last.running_var is not in the model. S.bn_last.num_batches_tracked is not in the model.

Is there _bnlast layer which is aded in training for this model and not part of master here?

joonson commented 3 years ago

There was an optional output batchnorm in the experimental code, but it has been deleted since. You can ignore this warning.

009deep commented 3 years ago

Thank you so much for your guidance and help. I could verify result with this weight.

One thing I'd note as I have commented previously, it improves result for voxceleb test set but worsens for in-house (out-of-domain, non youtube) dataset compared with 2009.14153. I'll raise separate discussion for ideal way for using non-domain data such librispeech with this training. Thanks.