Outputs of same batch produce different results with deterministic settings on

rbracco commented 3 years ago

I'm working on implementing something new but I'm having trouble getting the model to output the same thing each time. The output from the featurizer is consistent, but the output from Quartznet 15x5 (encoder + decoder) is slightly different each time. I am aware of #1030 and have taken the following steps.

Set pytorch lightning seed with pl.utilities.seed.seed_everything(8)
Set num_workers = 0 in the dataloader (This is the default NeMo value anyway, right?)
Setting shuffle=False in the train dataloader
Pass the flag deterministic=True to the lightning trainer.
Turn off dithering (this shouldn't really matter since I'm setting a seed)
Setting the environment variable CUBLAS_WORKSPACE_CONFIG=:16:8

Any idea what could be causing this? If I load the model weights in to a quartznet like model in pytorch lightning and set the seed, the output is the same every time. Thank you!

titu1994 commented 3 years ago

Could you try quartzner with batch size 1?

rbracco commented 3 years ago

Issue solved, SpecAugment and SpecCutout weren't getting the seed, setting the seed manually on their rng with self._rng.seed(8) in the __init__() of each solved the issue. Thank you.

NVIDIA / NeMo

Outputs of same batch produce different results with deterministic settings on #2145