I am experiencing issues with achieving full reproducibility when running the code. Despite setting the seed to 0 and enabling deterministic=True in train.py, I am noticing that the results are not entirely identical across runs. For example, the loss values differ between Case 1 and Case 2 I’ve attached below. Could you please advise if there are additional steps required to ensure exact reproducibility?
Atomic operations in CUDA are unordered and random when parallelized. Therefore, even with a fixed seed and deterministic settings, training results can still vary.
I am experiencing issues with achieving full reproducibility when running the code. Despite setting the
seed
to0
and enablingdeterministic=True
intrain.py
, I am noticing that the results are not entirely identical across runs. For example, the loss values differ between Case 1 and Case 2 I’ve attached below. Could you please advise if there are additional steps required to ensure exact reproducibility?