Closed 009deep closed 3 years ago
It was made possible by improvements from a much larger batch size from FP16 training. I also think that (without evidence) the 0.88 one had a 'lucky' random seed.
Ah, I see. So is that separate codebase with use of autocast? Could you provide sample of how to properly use autocast with training here?
It's merged into master now. You need to use the --mixedprec
flag.
Ah, I see. I haven't used that merge yet, I'll try out. Could you provide new weights and setting to verify result?
Here is the model that produces 0.88% EER on Vox1 cleaned.
@009deep Also pls note that 2010.15809
tests using the cleaned version of Vox1 and 2009.14153
tests using the original version of Vox1. This will make a difference of at least 0.1% EER.
Here is the model that produces 0.88% EER on Vox1 cleaned.
@009deep Also pls note that
2010.15809
tests using the cleaned version of Vox1 and2009.14153
tests using the original version of Vox1. This will make a difference of at least 0.1% EER.
thanks very much!
Thank you @joonson .
While using ResNetSE34V2 , I get following warning messages: S.bn_last.weight is not in the model. S.bn_last.bias is not in the model. S.bn_last.running_mean is not in the model. S.bn_last.running_var is not in the model. S.bn_last.num_batches_tracked is not in the model.
Is there _bnlast layer which is aded in training for this model and not part of master here?
There was an optional output batchnorm in the experimental code, but it has been deleted since. You can ignore this warning.
Thank you so much for your guidance and help. I could verify result with this weight.
One thing I'd note as I have commented previously, it improves result for voxceleb test set but worsens for in-house (out-of-domain, non youtube) dataset compared with 2009.14153
. I'll raise separate discussion for ideal way for using non-domain data such librispeech with this training. Thanks.
@joonson In paper1 you have the minimum EER of 0.88 vs 1.18 in paper2.
What is main difference in training there? It looks same per table in term of architecture and loss. Also, I think weight on github is from paper2. Could you share weight from paper1?