Closed nonmetal closed 5 months ago
This is essentially a problem of how to run DDP in pytorch using slurm, because fairseq warps on pytorch's DDP, which is a more general problem not specific to our repo. On the slurm side, as far as I know, in the newer version of slurm, line 6 to 11 of the run_pretrain_multi.sh are set by slurm, which can be safely commented out. On the pytorch side, in fairseq config, there is a distributed_training section where you can experiment with different settings. F
As for the fine-tuning code, it is available in fairseq's repo.
Thank you for your comments! I would try for base on your comments 🙏
I was able to solve a problem of distributed learning based on your information! Huge thanks🙏
dear friend,I try to finetune contentvec,do you finetune using train code straightly or write code referring to fairseq ?could you tell how to finetune contentvec using this code? Sincerely thank you very much if you can answer
Hello, thank you very much for providing great training code. Our team is currently trying to train a new contentvec model to reduce language dependency. Therefore, we are attempting to train from data other than LibriSpeech.
Though, I have a question as there is an error regarding the multinode environment. Although we adjusted the variable of
PROC_PER_NODE
inrun_pretrain_multi.sh
to match the number of GPUs that is currently available, only one GPU is detected in actual training, and this was the same even if the same method as the example (LibriSpeech) was used. I attach log regarding to the issue.I would like to ask if there is any error regarding the issue, and whether I should refer to the fairseq code. Also, I would like to ask whether you have a plan to provide additional implementation for the fine-tuning code in the future.