Multi gpu cpu support - Githubissues

clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition

MIT License

1.03k stars 272 forks source link

Multi gpu cpu support #26

Closed 009deep closed 3 years ago

009deep commented 4 years ago

Adding support for

[x] Training on Multi-gpu
[x] Inference on CPU

joonson commented 4 years ago

We have checked the code, but found that the use of DataParallel leads to a drop in model performance. I am not sure why this happens, but this appears to be a common problem in PyTorch. We will merge the request once this problem is resolved.

009deep commented 4 years ago

I understand concern. But, with few other changes, I've been able to train model to get EER as low as 1.5 % by using above code among few other changes. Also, use of single core limits model size. For example, current Resnet model uses 1/4 of filter size as compared to actual Resnet34, but if someone wants to experiment with actual size it won't be feasible on even 32 GB memory GPU. Adding parallelism would help with wide range of experimentation. How much does performance differ btw? As angleproto doesn't render best result for my use case, I've not pursued it to get similar result.

zh794390558 commented 4 years ago

How to reproduce 1.5 EER, I will test this.

009deep commented 4 years ago

Sorry, I may release those details later in the year. But training is done using code in this PR. So if you want to use multi-gpu, feel free to use these changes.

ShaneRun commented 4 years ago

Sorry, I may release those details later in the year. But training is done using code in this PR. So if you want to use multi-gpu, feel free to use these changes.

I understand concern. But, with few other changes, I've been able to train model to get EER as low as 1.5 % by using above code among few other changes. Also, use of single core limits model size. For example, current Resnet model uses 1/4 of filter size as compared to actual Resnet34, but if someone wants to experiment with actual size it won't be feasible on even 32 GB memory GPU. Adding parallelism would help with wide range of experimentation. How much does performance differ btw? As angleproto doesn't render best result for my use case, I've not pursued it to get similar result.

@009deep Thank you for your effort! May I ask a question: have you used data augmentation in order to get EER as low as 1.5 %? Or you just use the dev set of voxceleb2? Thank you in advance!

009deep commented 4 years ago

With dev set of voxceleb2 + augmentation you can get below 1.5. But, it's not just data, there are other changes including network architecture, loss and hyper parameter tuning. As I said I may release those details later this year.

Shane-pe commented 4 years ago

With dev set of voxceleb2 + augmentation you can get below 1.5. But, it's not just data, there are other changes including network architecture, loss and hyper parameter tuning. As I said I may release those details later this year.

@009deep Cool, looking forward to your disclosure on those improvement. BTW, I am curious on your data augmentation method, are you used the typical 5-fold data augmentation recipe from Kaldi? Or you had tried different method?

009deep commented 4 years ago

I obtained results using standard kaldi recipes.

joonson commented 3 years ago

DataParallel is no longer recommended in PyTorch. We updated the trainer with DistributedDataParallel.