Seeding and multi-GPU training

Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Apache License 2.0

28.59k stars 3.4k forks source link

📚 Documentation

I'm training a model in a multi GPU environment using the DDP strategy. Looking here I see that it is important to call L.seed_everything(...) to make sure the model is initialized the same way across devices. However here it says that this is not needed. I tried a test run on my environment and noted that even without calling seed_everything I get that the model is initialized with the same weights across devices, which makes me think it is the latter. Is this correct?

And quick follow-up. If I wanted to set a different seed for each device, how would I go about it? Just normal seed_everything but with a different seed value for each process (e.g. using self.global_rank inside the module)?

Thanks

cc @borda

def main(): # Setting all the random seeds to the same value. # This is important in a distributed training setting. # Each rank will get its own set of initial weights. # If they don't match up, the gradients will not match either, # leading to training that may not converge. pl.seed_everything(1)

Lightning-AI / pytorch-lightning

Seeding and multi-GPU training #20188

📚 Documentation