facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
MIT License
3.52k stars 304 forks source link

Some details about RVQ code #16

Open yangdongchao opened 2 years ago

yangdongchao commented 2 years ago

❓ Questions

Hi, when I try to reproduce the training code based on your released part, I meet a question when I try to use multiple-GPU to train, that is, I find that https://github.com/facebookresearch/encodec/blob/main/encodec/quantization/core_vq.py#L150 and https://github.com/facebookresearch/encodec/blob/main/encodec/quantization/core_vq.py#L168 will cause the DDP training stop, I find the problem is this code will cause mutilple-GPU to wait each other. Thus, I delete this line code. Now, it can be trained with torch DDP. But I donot know whether this line code will influence the performance? Can you give me some advice whether this line code can be deleted?

adefossez commented 2 years ago

Good point, we actually did not use DDP for the training but custom distributed routines. We perform manual averaging of the gradients and the model buffers after the backward call using all reduce operators provided by torch.distributed. See encodec/distrib.py, in particular sync_grad and sync_buffers.

compressor1212 commented 2 years ago

@yangdongchao did you succeed in training the model?

yangdongchao commented 2 years ago

@yangdongchao did you succeed in training the model?

Yes, I success training the model.

compressor1212 commented 2 years ago

@yangdongchao can you share the code if possible?

lizeyu519 commented 1 year ago

@yangdongchao can you share the code ?Thank you very much

xiaonengmiao commented 3 weeks ago

Good point, we actually did not use DDP for the training but custom distributed routines. We perform manual averaging of the gradients and the model buffers after the backward call using all reduce operators provided by torch.distributed. See encodec/distrib.py, in particular sync_grad and sync_buffers.

Hi, where do we put sync_grad and sync_buffers?