facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.09k stars 2.37k forks source link

Problem about weight initialization using DDP #544

Open NickyMouseSG opened 1 year ago

NickyMouseSG commented 1 year ago

It seems that you set a different seed for each rank before building the model. This may lead to different parameter initialization for different duplicate on each rank. Is it a mistake or a deliberate design?

Here is a comment from pytorch lightning ddp advice

Setting all the random seeds to the same value. This is important in a distributed training setting. Each rank will get its own set of initial weights. If they don't match up, the gradients will not match either. I'm not sure what the final performance will be like, but this setting seems faulty in theory.

"""starts from main.py line 115"""
# fix the seed for reproducibility
seed = args.seed + utils.get_rank()
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)

model, criterion, postprocessors = build_model(args)
model.to(device)
xk-huang commented 1 year ago

Check here. All the parameters are forced to be same when the DDP object is instantiated. https://github.com/pytorch/pytorch/blob/1dba81f56dc33b44d7b0ecc92a039fe32ee80f8d/torch/nn/parallel/distributed.py#LL798C63-L798C63