clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

Can't replicate the results #24

Closed mussakhojayeva closed 4 years ago

mussakhojayeva commented 4 years ago

Hi! I trained the model using the parameters provided and got IT 500, LR 0.000081, TEER/T1 66.35, TLOSS 1.446545, VEER 2.4920 with the best one on: IT 480, LR 0.000090, TEER/T1 66.39, TLOSS 1.444428, VEER 2.4390

I used the following command: python ./trainSpeakerNet.py --model ResNetSE34L --encoder SAP --trainfunc angleproto --optimizer adam --save_path data/exp1 --batch_size 800 --max_frames 200

And using the provided meta_files for training (5994 speakers) and testing.

Am I doing something wrong?

009deep commented 4 years ago

What value of nSpeakers did you use?

mussakhojayeva commented 4 years ago

What value of nSpeakers did you use?

for angleproto = 2

joonson commented 4 years ago

The results in the paper uses max_frames=400 and the lowest accuracy of the 500 epochs. What accuracy get if you evaluate with this setting? You can use the model trained with a shorter max_frames.

mussakhojayeva commented 4 years ago

It worked! Thanks!

jlian2 commented 4 years ago

@mussakhojayeva Hi , may I know what max_frames do you use finally?

mussakhojayeva commented 4 years ago

@mussakhojayeva Hi , may I know what max_frames do you use finally?

I used 200 for training and 400 for testing

jlian2 commented 4 years ago

@mussakhojayeva Hi , may I know what max_frames do you use finally?

I used 200 for training and 400 for testing

Thanks!