How to implement multi-gnu training?

yw155 commented 5 years ago

Hi @pierrestock, I would like to ask you how to implement the codes of multi-gnu training. Which parts could be run on multi-gnu, like quantization, fine-tune and global fine-tune? Thank you.

pierrestock commented 5 years ago

Hi yw155,

Thanks for reaching out! Regarding the multi-GPU training:

The quantization is done one one single GPU, say GPU 0
The finetuning (layer-wise and global) is done on all the available GPUs, say GPUs 0-7

You can use torch.distributed do perform the distributed training and use the barrier() function for GPUs 1-7 to wait for GPU 0 to perform quantization as well as the broadcast() function to broadcast the centroids and assignments obtained by GPU 0 to GPUs 1-7.

Hope this helps,

Pierre

yw155 commented 5 years ago

Thank you very much.

Fight-hawk commented 5 years ago

@yw155 Hi, did you implement multi-gpu training?

facebookresearch / kill-the-bits

How to implement multi-gnu training? #20