GFNOrg / gflownet

Generative Flow Networks
MIT License
608 stars 77 forks source link

About Reproducibility Issues #8

Closed dongqian0206 closed 2 years ago

dongqian0206 commented 2 years ago

Hi there,

Thank you very much for sharing the source codes.

For reproducibility, I modified the codes as follows,

https://github.com/GFNOrg/gflownet/blob/831a6989d1abd5c05123ec84654fb08629d9bc38/mols/gflownet.py#L84

---> self.train_rng = np.random.RandomState(142857)

as well as to add

torch.manual_seed(142857)
torch.cuda.manual_seed(142857)
torch.cuda.manual_seed_all(142857)

However, I encountered an issue. I ran it more than 3 times with the same random seed, but the results are totally different (although they are close). I didn't modify other parts, except for addressing package compatibility issues.

0 [1152.62, 112.939, 23.232] 100 [460.257, 44.253, 17.728] 200 [68.114, 6.007, 8.045]

0 [1151.024, 112.603, 24.993] 100 [471.219, 45.525, 15.964] 200 [66.349, 6.174, 4.607]

0 [1263.066, 124.094, 22.128] 100 [467.747, 44.899, 18.76] 200 [61.992, 5.715, 4.841]

I am wondering whether you encountered such an issue before.

Best,

Dong

bengioe commented 2 years ago

Hi Dong,

even with seeding, CUDA operations are not guaranteed to be deterministic, therefore results will change from run to run. This is especially problematic in online-RL-type setups since tiny changes in parameters can result in changes in (self-generated) data distribution which compounds very quickly and pushes models in different directions.

There are ways to get deterministic results but these usually involve using much slower operations which are undesirable in research.

dongqian0206 commented 2 years ago

OK. Thank you for your prompt reply.