Open tomsons22 opened 3 months ago
Also very interested in an answer to this as I'm seeing conflicting documentation online here too - e.g., on WANDB:
def main():
# Setting all the random seeds to the same value.
# This is important in a distributed training setting.
# Each rank will get its own set of initial weights.
# If they don't match up, the gradients will not match either,
# leading to training that may not converge.
pl.seed_everything(1)
📚 Documentation
I'm training a model in a multi GPU environment using the DDP strategy. Looking here I see that it is important to call
L.seed_everything(...)
to make sure the model is initialized the same way across devices. However here it says that this is not needed. I tried a test run on my environment and noted that even without callingseed_everything
I get that the model is initialized with the same weights across devices, which makes me think it is the latter. Is this correct?And quick follow-up. If I wanted to set a different seed for each device, how would I go about it? Just normal
seed_everything
but with a different seed value for each process (e.g. usingself.global_rank
inside the module)?Thanks
cc @borda