clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.01k stars 272 forks source link

Cannot achieve paper's RawNet3 results using an official recipe #160

Open happyjin opened 1 year ago

happyjin commented 1 year ago

Dear author,

I cannot reimplement the paper's results using the RawNet3 script, which should get EER 0.8932. I am wondering if the paper's result is wrong. Can you please upload a recipe so that we can reimplement the result on paper?

Jungjee commented 1 year ago

Hi, can you share your results?

I cannot share the exact training recipe because it includes internal codes, which is one of the reasons why I shared trained weight parameters. However, model architecture is exactly the same and you should be getting similar results.

JunLi0514 commented 7 months ago

Hi, we're encountering a similar issue. The pre-trained RawNet3 achieves an EER of 0.9809% with the full-length enroll and full-len test utterances. But when we train RawNet3 with the voxceleb1 & 2 dev set and use noise and reverberation addition as augmentation methods, the EER increases to 1.20% after 40 epochs. Besides, the EER rises to 1.4% after applying speed perturbation for voxceleb 2 dev.

During training, the mixedprec and distributed arguments are used to accelerate training.

Could you please provide some advice on how to address this? Thank you!

Jungjee commented 7 months ago

@JunLi0514 , hi, thanks for reporting your status. Speaking of EER 0.98%, did you follow the same setup by segmenting it into ten 4-second segments? If you input the full-length utterance, it would be different to what we did and hence result might be affected.

Note that I trained RawNet3 using another codebase. Only the model architecture has been updated to this repo (VoxCeleb_trainer).

FYI, due to my changed affiliation, I recently developed a RawNet3 reproducible recipe in ESPnet2, where I achieved EER of 0.73% with RawNet3 and it was reproducible several times when tested.

JunLi0514 commented 7 months ago

Hi, @Jungjee , thank you for your quick reply! The work is impressive and the training recipe described in the paper is detailed. With your kind advice, we'll test the pretrained model with a duration of 4s and try training on ESPnet2 ^ ^