antoyang / TubeDETR

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Apache License 2.0
171 stars 8 forks source link

Problem about weight initialization using DDP #15

Closed jingwangsg closed 1 year ago

jingwangsg commented 2 years ago

Hi Antoine It seems that you set a different seed for each rank before building the model. This may lead to different parameter initialization for different duplicate on each rank. Is it a mistake or a deliberate design?

Here is a comment from pytorch lightning ddp advice

Setting all the random seeds to the same value. This is important in a distributed training setting. Each rank will get its own set of initial weights. If they don't match up, the gradients will not match either, leading to training that may not converge.

"""starts from main.py line 347"""
# fix the seed for reproducibility
seed = args.seed + dist.get_rank()
torch.manual_seed(seed)
np.random.seed(seed)
random.seed(seed)
# torch.set_deterministic(True)
torch.use_deterministic_algorithms(True)

# Build the model
model, criterion, weight_dict = build_model(args)
model.to(device)
antoyang commented 2 years ago

Hi, this is not deliberate: if I remember correctly this part of code is from the MDETR codebase which itself comes from the DETR codebase.

jingwangsg commented 2 years ago

Hi, thank you for you reminder. I opened a new issue in original DETR repo. Not sure how this setting will affect the final performance, but it should be fixed if it's a bug indeed.

antoyang commented 1 year ago

Seems it is solved https://github.com/facebookresearch/detr/issues/544.