clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

The configuration of HA4 in paper #90

Closed llearner closed 3 years ago

llearner commented 3 years ago

@joonson Mr.joonson, thank you for your contribution to speaker recognition!

According to "The ins and outs of speaker recognition: lessons from VoxSRC 2020", we try to reproduce HA4 H/ASP AP+softmax using 1 Tesla-V100 GPU, and mixedprec is True, but our EER is much higher than 0.88%, could you please provide your configuration? Thanks a lot!

This is our config: model: ResNetSE34Half n_mels: 64 log_input: True trainfunc: softmaxproto batch_size: 400 nPerSpeaker: 2 augment: True lr: 0.001 lr_decay: 0.75 weight_decay: 5e-5 test_interval: 16 encoder_type: ASP max_epoch: 256 max_seg_per_spk: 100

llearner commented 3 years ago

sorry, i find some problem in my config...

llearner commented 3 years ago

I got EER 1.1188% vs 0.88% in "The ins and outs of speaker recognition: lessons from VoxSRC 2020", does anyone duplicate it and get a different result ? Here is my config: model: ResNetSE34V2 n_mels: 64 log_input: True trainfunc: softmaxproto batch_size: 450 nPerSpeaker: 2 augment: True lr: 0.001 lr_decay: 0.75 weight_decay: 5e-5 test_interval: 16 encoder_type: ASP max_epoch: 256 max_seg_per_spk: 500 eval_frames: 400 margin: 0.2 scale: 30 nOut: 512

Shane-pe commented 3 years ago

@llearner Sorry, I haven't tried it, but I will do it later. In addition, I guess you can try ResNetSE34Half instead of ResNetSE34V2.

joonson commented 3 years ago

See #89