Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.09k stars 3.36k forks source link

Confusion on random transformation for same batch #2975

Closed amitness closed 4 years ago

amitness commented 4 years ago

Hi @williamFalcon,

I have been trying to overfit SimCLR on a single batch containing 2 images. I added breakpoints and noticed that each time the same batch is loaded on subsequent epochs, the transformation applied is different. Is this expected?

My expected result was that random transformations are applied once at start of training and same transformation for certain batch would persist across epochs. Not sure if this is a bug or I have a wrong understanding of the way DataLoaders work?

Here is a minimal example with

from pl_bolts.models.self_supervised import SimCLR
import pytorch_lightning as pl

pl.seed_everything(42)

model = SimCLR(data_dir='/tmp', batch_size=2)

trainer = pl.Trainer(gpus=1, 
                     overfit_batches=1,
                     deterministic=True)
trainer.fit(model)
williamFalcon commented 4 years ago

that’s the expected behavior. if you want deterministic transforms i would try to remove the stochasticity of the transforms?

amitness commented 4 years ago

@williamFalcon I see. It's tricky in my case because I want different images in the dataset to have a different stochastic transformation, but the overall run should be reproducible. So, each image gets assigned the same stochastic transformation even on re-run.

If I set a random seed where the transforms are applied, then all images would get the same transformation. I will try writing a custom dataset and using the index as the random seed to see if that works. If you've any other ideas, I would love to hear them.


On a side note, how do researchers make sure their experiments are reproducible if there are random variations like this in papers such as SimCLR? I found it interesting that most implementations available have stochastic transformations.

amitness commented 4 years ago

@williamFalcon Setting random seed based on index worked. I noticed a few extra problems with Lightning:

  1. When I set distributed_backend='ddp', then the seed_everything and deterministic=True has no effect. The transformation again becomes stochastic even after setting the seeds.
  2. When I use distributed_backend=dp, the manual seeds I set work.

Here is dp vs ddp on training a single batch. Screenshot from 2020-08-14 17-33-14

Do I need to set seeds for CUDA when using ddp? I noticed the implementation of seed_everything in lightning only does these but doesn't do anything to make CUDNN deterministic.

    os.environ["PYTHONHASHSEED"] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)

Usually, on forums, I've seen these additional steps recommended. Is there a reason why this was not used in seed_everything function in Lightning?

torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
williamFalcon commented 4 years ago

as stated in the docs, setting deterministic also slows things down. that’s why we don’t do it by default.

ddp launches a python call to other processes, so some info may not be transferred. i would use ddp_spawn in this case

amitness commented 4 years ago

Thanks for all the help.