Difference pytorch 0.4.1 vs 1.1

glitchmobbed commented 5 years ago

Hi, This may not be the correct place to ask this, but I thought it's worth a shot. First off, thanks for releasing this code, it has been very helpful and I learned a lot. I am evaluating this model for semantic segmentation of large scale laser scan data and I have so far used the older pytorch 0.4 build with CUDA 8. For several reasons, I recently tried out the newest release with pytorch 1.1 + CUDA 10 + CUdnn 7.3.1.

I am noticing a significantly worse convergence behaviour(depending on the model setup and training data, anywhere from 2 to 8% worse accuracy when no longer any gains are made) and visibly worse segmentation results on test data on the newer build, when using the same model parameters, same hyperparameters, same Data-Loading, hardware, same everything. Also when loading the same model checkpoint and evaluating with model.eval() on the exact same data, there are some minor differences even after the first SA layer, although I am not sure this is the reason for the worse training performance.

I came across some tidbits here and there about different behaviour of log_softmax in the newer version of nn.CrossEntropyLoss(), or maybe the fact I had to compile pytorch 1.1 myself vs installing a pre-compiled version of 0.4 is the reason. For building pyTorch 1.1 and your extensions I used gcc 7.3 with nvcc 10. Everything is in anaconda3 on a ubuntu box. If you have any ideas, I'd be very happy to hear them! Cheers, Johannes

erikwijmans commented 5 years ago

I don't have a build of pytorch master anywhere so I can't easily try things on master. I tried pytorch 1.0.1 + CUDA 10 + cudnn 7.4 and I am seeing results I would expect for the example semantic segmentation script, so I don't believe I introduced any bugs when upgrading this repo to pytorch >= 1.0 (there definitely could still be as deep learning is really good at managing to do something sensible even with horrible bugs however).

Could you try pytorch 1.0.1?

glitchmobbed commented 5 years ago

Hello, thank you for the suggestion. I compiled pytorch 1.0.1 with cuda 10 and cudnn 7.3.1. The discrepancies in loss and accuracy look the same as with pytorch 1.1. I'm using a moderately large custom dataset, could be it only shows up with my data, i haven't tested yet on indoor3d or modelnet. Might do that next.

Cheers, Johannes

erikwijmans / Pointnet2_PyTorch

Difference pytorch 0.4.1 vs 1.1 #37