NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.47k stars 367 forks source link

MinkowskiEngine is not a good research tool for reproducebility. #452

Closed zshyang closed 2 years ago

zshyang commented 2 years ago

Describe the bug If you run the example for classification on ModelNet40 with arguments max_epoch=10 and stat_freq=1, you could see your results are different each time you run it. I would suggest before they fix this issue, don't use this package as your research tool because your results could not be precisely reproduced.

First run:

Iter: 0, Loss: 3.816e+00
Iter: 1, Loss: 3.602e+00
Iter: 2, Loss: 3.470e+00
Iter: 3, Loss: 3.473e+00
Iter: 4, Loss: 3.609e+00
Iter: 5, Loss: 3.712e+00
Iter: 6, Loss: 3.649e+00
Iter: 7, Loss: 3.438e+00
Iter: 8, Loss: 3.357e+00
Iter: 9, Loss: 3.591e+00

Second run:

Iter: 0, Loss: 3.816e+00
Iter: 1, Loss: 3.602e+00
Iter: 2, Loss: 3.469e+00
Iter: 3, Loss: 3.482e+00
Iter: 4, Loss: 3.623e+00
Iter: 5, Loss: 3.739e+00
Iter: 6, Loss: 3.618e+00
Iter: 7, Loss: 3.440e+00
Iter: 8, Loss: 3.350e+00
Iter: 9, Loss: 3.568e+00

To Reproduce

It is in the description.


Expected behavior

The output should be the same.


Desktop (please complete the following information):

Doesn't matter.


Additional context Add any other context about the problem here.

JunpeiHuang commented 2 years ago

bro, sometimes you gets different evaluations due to different initialization which could be a random process.

aniqueakhtar commented 2 years ago

you could see your results are very different each time you run it.

Your results look very close to each other. Why are you claiming that the results are very different?

zshyang commented 2 years ago

@JunpeiHuang: In the example, every single detail about initialization is fixed. I do not think that is the problem.

def seed_all(random_seed):
    torch.manual_seed(random_seed)
    torch.cuda.manual_seed(random_seed)
    torch.cuda.manual_seed_all(random_seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    np.random.seed(random_seed)
    random.seed(random_seed)

@aniqueakhtar: Yes, you are right. The results are not very different. I will modify that claim. The real problem is reproducibility. It is still a huge problem even if the difference is very small. Think about this. You provide the same random seed and the exact code to others. But they just can not provide your results or even yourself can not produce the same number again. Isn't it frustrating?

aniqueakhtar commented 2 years ago

Think about this. You provide the same random seed and the exact code to others. But they just can not provide your results or even yourself can not produce the same number again. Isn't it frustrating?

That is why people provide a pre-trained model, that would give everyone the same results. How a person trains their model is different from person to person. I do not believe the other deep learning libraries/tools would give you the same results during training. As training in itself is a randomized event. The functions like DropOut, etc, Especially during the .train() mode are specifically implemented to create randomness during training. Quickly looking through the ModelNet40 classification code, I can see a DropOut layer being implemented. BatchNorm layer could also cause this (but not completely sure). BatchNorm computes a running mean and variance that is used during prediction.

Nonetheless, in deep learning, the final pretrained model is what is important. Everyone has their own tricks/parameters to train their model.

zshyang commented 2 years ago

@aniqueakhtar : Thanks for your reply. Part of your statements are correct.

  1. According to my knowledge, DropOut is also a fixed process once you specified the random seed.
  2. From my experience with other networks (even with DropOut), like PointNet, the training procedure could be produced precisely each time I train the network.
  3. the Same situation for BatchNorm, they could be controlled once you fix the order of your data.

Nonetheless, reproducibility or a pre-trained model which is more important, depends, right? I would say that reproducibility is more important than just providing a pre-trained model that could never be reproduced. But I also agree that you could think the pre-trained model is more important.

chrischoy commented 2 years ago

@zshyang First, run the code and get the result before you complain. Second, please be nice when you ask.

zshyang commented 2 years ago

Hey, what is wrong with my sentence? The Internet has memory. Remember that. I ran the code and got the result. It is posted at the beginning of this issue.

@zshyang First, run the code and get the result before you complain. Second, don't be an asshole :)