Open NickyMouseSG opened 1 year ago
Check here. All the parameters are forced to be same when the DDP object is instantiated. https://github.com/pytorch/pytorch/blob/1dba81f56dc33b44d7b0ecc92a039fe32ee80f8d/torch/nn/parallel/distributed.py#LL798C63-L798C63
It seems that you set a different seed for each rank before building the model. This may lead to different parameter initialization for different duplicate on each rank. Is it a mistake or a deliberate design?
Here is a comment from pytorch lightning ddp advice