Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.57k stars 3.4k forks source link

Seeding and multi-GPU training #20188

Open tomsons22 opened 3 months ago

tomsons22 commented 3 months ago

📚 Documentation

I'm training a model in a multi GPU environment using the DDP strategy. Looking here I see that it is important to call L.seed_everything(...) to make sure the model is initialized the same way across devices. However here it says that this is not needed. I tried a test run on my environment and noted that even without calling seed_everything I get that the model is initialized with the same weights across devices, which makes me think it is the latter. Is this correct?

And quick follow-up. If I wanted to set a different seed for each device, how would I go about it? Just normal seed_everything but with a different seed value for each process (e.g. using self.global_rank inside the module)?

Thanks

cc @borda

jlotthammer commented 3 months ago

Also very interested in an answer to this as I'm seeing conflicting documentation online here too - e.g., on WANDB:

def main():
    # Setting all the random seeds to the same value.
    # This is important in a distributed training setting.
    # Each rank will get its own set of initial weights.
    # If they don't match up, the gradients will not match either,
    # leading to training that may not converge.
    pl.seed_everything(1)