Problem about weight initialization using DDP

It seems that you set a different seed for each rank before building the model. This may lead to different parameter initialization for different duplicate on each rank. Is it a mistake or a deliberate design?

Here is a comment from pytorch lightning ddp advice

Setting all the random seeds to the same value. This is important in a distributed training setting. Each rank will get its own set of initial weights. If they don't match up, the gradients will not match either. I'm not sure what the final performance will be like, but this setting seems faulty in theory.

"""starts from main.py line 115"""
# fix the seed for reproducibility
seed = args.seed + utils.get_rank()
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)

model, criterion, postprocessors = build_model(args)
model.to(device)

facebookresearch / detr

Problem about weight initialization using DDP #544