using DistributedDataParallel

sostupidDog commented 1 year ago

I found that the code using DataParllel Instead of DistributedDataParallel. This results in uneven memory distribution, with GPU0 full and other GPUs accounting for only half. I'm using 3090 and the batch size for each GPU training is set to a maximum of 2. when I modify the code to use DistribtedDataParallel, the memory of each GPU is almost full (the number of batches is still 2), although the training speed becomes faster, but I would like to increase the number of batches to 4, which does not seem to work, I think the emahelper in the code takes up a lot of memory, it needs to update the parameters of the ema model on the GPU, is there any way to solve this problem?

HLJT commented 1 year ago

你好，想问一下你是怎么实现多GPU训练的？

jiamings commented 1 year ago

I only used DataParallel in that project.

ermongroup / ddim

using DistributedDataParallel #25