clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.03k stars 272 forks source link

Add distributed training #61

Closed joonson closed 3 years ago

joonson commented 3 years ago
wuqiangch commented 3 years ago

@joonson Have added it to the code? I can't find it.Thanks!

joonson commented 3 years ago

Not yet, hence the open issue

joonson commented 3 years ago

Added in a new branch distributed. This is the configuration used to produce EER 1.1771 in the released pre-trained model. Note that 8 GPUs were used to train this model, so test_interval and max_epoch must be changed accordingly if you want to use a different number of GPUs.

009deep commented 3 years ago

@joonson 8 different gpus or 8 cores on a single gpu?

009deep commented 3 years ago

Also in this comment there is separate config vs one mentioned above. Are both of them producing same results?

zeek-han commented 3 years ago

Thank you always. you said we must change the configuration when we use that code with different GPU environment. Could you recommend configuration for 2 GPU?

joonson commented 3 years ago

@zeek-han The number of distributed GPU affects the training in the same way changing the batch size does. You will need to experiment to find out what works best.

joonson commented 3 years ago

@009deep Very similar, but not exactly the same.

lawlict commented 3 years ago

Added in a new branch distributed. This is the configuration used to produce EER 1.1771 in the released pre-trained model. Note that 8 GPUs were used to train this model, so test_interval and max_epoch must be changed accordingly if you want to use a different number of GPUs.

Hi, @joonson. I test This configuration but the EER is 2.4%. Note that it only trains for 36 epochs. Is that right?

joonson commented 3 years ago

@lawlict Did you train using distributed training with 8 GPUs?

lawlict commented 3 years ago

@joonson Yes, of course. I think the number of training steps is too small.

Also in this comment there is separate config vs one mentioned above. Are both of them producing same results?

Another configuration here takes much more training steps, but I don't have time to test it now.