facebookresearch / swav

PyTorch implementation of SwAV https//arxiv.org/abs/2006.09882
Other
2.01k stars 280 forks source link

fixed seed, but no reproducability #45

Closed RGring closed 3 years ago

RGring commented 3 years ago

Hi Mathilde, Thanks for your great work. I enjoyed reading your paper!

When running main_swav.py, I experience no reproducibility of the results (although the seeds are set nicely in utils.fix_random_seeds).

RUN1: INFO - 12/07/20 09:38:11 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037) INFO - 12/07/20 09:38:30 - 0:00:25 - Epoch: [0][50] Loss 2.9354 (3.0861)

RUN2: INFO - 12/07/20 09:37:31 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037) INFO - 12/07/20 09:37:51 - 0:00:25 - Epoch: [0][50] Loss 2.9074 (3.0710)

Do you experience the same? If yes, do you have a clue why that is the case (maybe distributed training)?

Thanks in advance!

mathildecaron31 commented 3 years ago

Hi @RGring , Thank you so much for your interest in this work.

I think full reproducibility is indeed not expected with PyTorch. I recommend to read https://pytorch.org/docs/stable/notes/randomness.html.

fwtan commented 3 years ago

Hi Mathilde, Thanks for your great work. I enjoyed reading your paper!

When running main_swav.py, I experience no reproducibility of the results (although the seeds are set nicely in utils.fix_random_seeds).

RUN1: INFO - 12/07/20 09:38:11 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037) INFO - 12/07/20 09:38:30 - 0:00:25 - Epoch: [0][50] Loss 2.9354 (3.0861)

RUN2: INFO - 12/07/20 09:37:31 - 0:00:06 - Epoch: [0][0] Loss 3.5037 (3.5037) INFO - 12/07/20 09:37:51 - 0:00:25 - Epoch: [0][50] Loss 2.9074 (3.0710)

Do you experience the same? If yes, do you have a clue why that is the case (maybe distributed training)?

Thanks in advance!

It seems all linear modules, i.e. the prototypes, are not explicitly initialized in the code. The prototypes may already make a difference as they would not be updated during the first epoch. It may also cause the "duplicate prototypes" issue I guess?