Open ankmathur96 opened 5 years ago
There are a lot of factors at play for a given result. Pytorch version, CUDA, PIL, etc. Even changing the image scaling between bicubic and bilinear can have a notable impact. I default to bicubic but bilinear works better for some models, likely based on what they were originally trained with.
I have noticed changes in accuracy for many models that I measured over a year ago to now (same weights).
My ResNet50 number with PyTorch 1.0.1.post2 and CUDA 10: Prec@1 75.868, Prec@5 92.872
My old ResNet50 numbers with PyTorch (0.2.0.post1) and CUDA 9.x?: Prec@1 76.130, Prec@5 92.862
A table with some of my old measurements here: https://github.com/rwightman/pytorch-dpn-pretrained
ResNet50 on PyTorch 1.0.1.post2 and CUDA 10 w/ bilinear instead of bicubic, Prec@1 76.138, Prec@5 92.864 ... matches your numbers @ankmathur96
Interesting! I should mention that I am using PIL version 5.3.0.post0.
I believe that bilinear is the default in PyTorch transforms (https://github.com/pytorch/vision/blob/master/torchvision/transforms/transforms.py#L182) and it seems this repository is using the default (https://github.com/cgnorthcutt/benchmarking-keras-pytorch/blob/master/imagenet_pytorch_get_predictions.py#L95). It's interesting to note the difference when using bicubic though.
I've also seen variation with different CUDA versions and other setup differences similar to what you're describing. I've seen, for example, a full percentage point drop when using OpenCV's implementation bilinear resizing, as compared to PIL. I was unaware, though, that there could be a full percentage point drop from such setup differences in this kind of more constrained setting (using PyTorch/CUDA/PIL). I found this especially worth highlighting since this repo's evaluation seems to be off by enough that densenet169 performs worse than ResNet-50 in my setup.
Edit: It's worth noting that many such differences due to subtle changes in preprocessing implementations can be eliminated (if need be for a production use case) by fine tuning with a low learning rate for several epochs
@ankmathur96 yeah, I noticed when I was doing my benchmarking in the past that most of the resnet/densenet models in torchvision were better with the default bilinear, but a number of the ported models, Inception variants, DPN, etc were doing better with bicubic.
Fine-tuning can definitely help with these sorts of issues if/when it matters. It's also worth noting that many of the default pretrained weights can pretty easily be surpassed by around 1% or more using different training schedules and augmentation techniques.
FWIW my densenet169 numbers are very close to this repo and less than my ResNet50 numbers @1 but better @5.
I'm using Pillow-SIMD 5.3.0.post0
@ankmathur96 @rwightman Thanks for finding this. I agree its likely a PyTorch version / cuda version incompatibility. Did either of you find a fix? Feel free to send a Pull Request on https://github.com/cgnorthcutt/benchmarking-keras-pytorch/blob/master/imagenet_pytorch_get_predictions.py
@ankmathur96
I get 76.138% top-1 accuracy.
@rwightman
My ResNet50 number with PyTorch 1.0.1.post2 and CUDA 10: Prec@1 75.868, Prec@5 92.872 My old ResNet50 numbers with PyTorch (0.2.0.post1) and CUDA 9.x?: Prec@1 76.130, Prec@5 92.862
the difference between 75.868%
and 76.130%
(0.262%
difference) is not statistically significant with only 50,000
validation samples. Standard deviation of Binomial distribution with p=0.76
and n=50,000
is sqrt(.76*(1-.76)/50000)*100=0.19%
@ankmathur96
a full percentage point drop when using OpenCV's implementation bilinear resizing, as compared to PIL.
See these 2 URLs for the differences in bilinear resizing across libraries, or even same library same function, different padding options:
https://stackoverflow.com/questions/18104609/interpolating-1-dimensional-array-using-opencv https://stackoverflow.com/questions/43598373/opencv-resize-result-is-wrong
also see https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35
TFv2 now follows Pillow, not OpenCV, if there is a difference between the two... https://github.com/tensorflow/tensorflow/issues/6720
...which doesn't seem the case https://github.com/chainer/onnx-chainer/issues/147
@calebrob6 Caleb Robinson | How to reproduce ImageNet validation results http://calebrob.com/ml/imagenet/ilsvrc2012/2018/10/22/imagenet-benchmarking.html
For every image in the validation set we need to apply the following process:
- Load the image data in a floating point format.
- Resize the smallest side of the image to 256 pixels using bicubic interpolation over 4x4 pixel neighborhood (using OpenCVs resize method with the “INTER_CUBIC” interpolation flag). The larger side should be resized to maintain the original aspect ratio of the image.
- Crop the central 224x224 window from the resized image.
- Save the image in RGB format. [...] All the steps above are shown in the notebooks from the accompanying GitHub repository
Hey there!
I came across your project from Jeremy Howard's Twitter. I think it's great to be benchmarking these numbers and keeping them in a single place!
I've tried running your script and ran into some problems that I was hoping you could help diagnose: I ran
python imagenet_pytorch_get_predictions.py -m resnet50 -g 0 -b 64 ~/imagenet/
and gotI'm using Python 3.7 and PyTorch 1.0.1.post2 and didn't change any of your code except for making the argparse parameter for batch_size to be type=int.
I work pretty regularly with PyTorch and ResNet-50 and was surprised to see the ResNet-50 have only 75.02% validation accuracy. When I use the pretrained ResNet-50 using the code here, I get 76.138% top-1, 92.864% top-5 accuracy. Specifically, I run:
python main.py -a resnet50 -e -b 64 -j 8 --pretrained ~/imagenet/
I'm using CUDA 9.2 and CUDNN version 7.4.1 and running inference on a NVIDIA V100 on a Google Cloud instance using Ubuntu 16.04.
I'm curious what might be going wrong here and why our results are different - to start with, what version of CUDNN/CUDA did your results originate from?