Closed zshyang closed 2 years ago
bro, sometimes you gets different evaluations due to different initialization which could be a random process.
you could see your results are very different each time you run it.
Your results look very close to each other. Why are you claiming that the results are very different?
@JunpeiHuang: In the example, every single detail about initialization is fixed. I do not think that is the problem.
def seed_all(random_seed):
torch.manual_seed(random_seed)
torch.cuda.manual_seed(random_seed)
torch.cuda.manual_seed_all(random_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
np.random.seed(random_seed)
random.seed(random_seed)
@aniqueakhtar: Yes, you are right. The results are not very different. I will modify that claim. The real problem is reproducibility. It is still a huge problem even if the difference is very small. Think about this. You provide the same random seed and the exact code to others. But they just can not provide your results or even yourself can not produce the same number again. Isn't it frustrating?
Think about this. You provide the same random seed and the exact code to others. But they just can not provide your results or even yourself can not produce the same number again. Isn't it frustrating?
That is why people provide a pre-trained model, that would give everyone the same results. How a person trains their model is different from person to person. I do not believe the other deep learning libraries/tools would give you the same results during training. As training in itself is a randomized event. The functions like DropOut, etc, Especially during the .train() mode are specifically implemented to create randomness during training. Quickly looking through the ModelNet40 classification code, I can see a DropOut layer being implemented. BatchNorm layer could also cause this (but not completely sure). BatchNorm computes a running mean and variance that is used during prediction.
Nonetheless, in deep learning, the final pretrained model is what is important. Everyone has their own tricks/parameters to train their model.
@aniqueakhtar : Thanks for your reply. Part of your statements are correct.
Nonetheless, reproducibility or a pre-trained model which is more important, depends, right? I would say that reproducibility is more important than just providing a pre-trained model that could never be reproduced. But I also agree that you could think the pre-trained model is more important.
@zshyang First, run the code and get the result before you complain. Second, please be nice when you ask.
Hey, what is wrong with my sentence? The Internet has memory. Remember that. I ran the code and got the result. It is posted at the beginning of this issue.
@zshyang First, run the code and get the result before you complain. Second, don't be an asshole :)
Describe the bug If you run the example for classification on ModelNet40 with arguments
max_epoch=10
andstat_freq=1
, you could see your results are different each time you run it. I would suggest before they fix this issue, don't use this package as your research tool because your results could not be precisely reproduced.First run:
Second run:
To Reproduce
It is in the description.
Expected behavior
The output should be the same.
Desktop (please complete the following information):
Doesn't matter.
Additional context Add any other context about the problem here.