Closed amitness closed 4 years ago
that’s the expected behavior. if you want deterministic transforms i would try to remove the stochasticity of the transforms?
@williamFalcon I see. It's tricky in my case because I want different images in the dataset to have a different stochastic transformation, but the overall run should be reproducible. So, each image gets assigned the same stochastic transformation even on re-run.
If I set a random seed where the transforms are applied, then all images would get the same transformation. I will try writing a custom dataset and using the index as the random seed to see if that works. If you've any other ideas, I would love to hear them.
On a side note, how do researchers make sure their experiments are reproducible if there are random variations like this in papers such as SimCLR? I found it interesting that most implementations available have stochastic transformations.
@williamFalcon Setting random seed based on index worked. I noticed a few extra problems with Lightning:
distributed_backend='ddp'
, then the seed_everything
and deterministic=True
has no effect. The transformation again becomes stochastic even after setting the seeds.distributed_backend=dp
, the manual seeds I set work.Here is dp vs ddp on training a single batch.
Do I need to set seeds for CUDA when using ddp? I noticed the implementation of seed_everything
in lightning only does these but doesn't do anything to make CUDNN deterministic.
os.environ["PYTHONHASHSEED"] = str(seed)
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
Usually, on forums, I've seen these additional steps recommended. Is there a reason why this was not used in seed_everything
function in Lightning?
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
as stated in the docs, setting deterministic also slows things down. that’s why we don’t do it by default.
ddp launches a python call to other processes, so some info may not be transferred. i would use ddp_spawn in this case
Thanks for all the help.
Hi @williamFalcon,
I have been trying to overfit SimCLR on a single batch containing 2 images. I added breakpoints and noticed that each time the same batch is loaded on subsequent epochs, the transformation applied is different. Is this expected?
My expected result was that random transformations are applied once at start of training and same transformation for certain batch would persist across epochs. Not sure if this is a bug or I have a wrong understanding of the way DataLoaders work?
Here is a minimal example with