GaParmar / clean-fid

PyTorch - FID calculation with proper image resizing and quantization steps [CVPR 2022]
https://www.cs.cmu.edu/~clean-fid/
MIT License
903 stars 69 forks source link

ImageNet-1k statistics? #4

Open mehdidc opened 3 years ago

mehdidc commented 3 years ago

Hello, thanks for the great work and the package. Are there any plans to release ImageNet-1k statistics? if not, I can try to do it, and provide the steps to reproduce.

GaParmar commented 3 years ago

Hi,

Thanks for the suggestion! If you can provide me with the details of the dataset and steps to produce them, I can add these statistics too.

Regards, Gaurav

mehdidc commented 3 years ago

Hey, I could do it successfully using clean and legacy_pytorch modes. I just needed to add the extension 'JPEG' to https://github.com/GaParmar/clean-fid/blob/main/cleanfid/utils.py#L50 because ImageNet-1k images have that extension. I think it would be nice to make the extensions parametrizable, I could do a PR for that.

Here is the link for the stats (training and validation stats for both clean and legacy_pytorch modes): https://drive.google.com/drive/folders/1q7b-hqc-xUUGi9fGzfI1gVlJYk2Jji5h?usp=sharing

For legacy_tensorflow, it did not work, it raised an exception: return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 250, 250] at entry 0 and [3, 150, 200] at entry 1

Here are the steps to reproduce so that you can compare with the above stats if you would like:

  1. Download ILSVRC2012_img_train.tar and ILSVRC2012_img_valid.tar from https://image-net.org/download.php
  2. Extract training: tar xvf ILSVRC2012_img_train.tar -C train, which itself contains tars. Inside train: for v in *.tar;do tar xvf $v;done
  3. Extract validation: tar xvf ILSVRC2012_img_train.tar -C valid
  4. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="clean", num_workers=8, batch_size=128)'
  5. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="legacy_pytorch", num_workers=8, batch_size=128)'
  6. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="clean", num_workers=8, batch_size=128)'
  7. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="legacy_pytorch", num_workers=8, batch_size=128)'

For training there are 1281167 images, and for valid 50000 images.

software stack:

torch==1.8.1+cu111
torchvision==0.9.1+cu111
numpy==1.19.0
scipy==1.6.3
pillow=8.2.0
requests==2.25.1
clean-fid==0.1.13

Also:

CUDA: 11.1.1
cuDNN: 8.0.4.30
GaParmar commented 3 years ago

Thanks for providing the details. I will take a look at the error with the "legacy_tensorflow" mode. I will verify/test the statistics with some pretrained models and get back to you when I upload them.

GaParmar commented 2 years ago

Hi,

I would be more careful with the steps required in processing the ImageNet images. In the steps you have followed, you are resizing all ImageNet images without applying any crop. This might not be the commonly followed setting. See Section A.1 in this paper for some details (https://arxiv.org/pdf/2006.10738.pdf)

Regards, Gaurav

machengcheng2016 commented 4 months ago

Hey, I could do it successfully using clean and legacy_pytorch modes. I just needed to add the extension 'JPEG' to https://github.com/GaParmar/clean-fid/blob/main/cleanfid/utils.py#L50 because ImageNet-1k images have that extension. I think it would be nice to make the extensions parametrizable, I could do a PR for that.

Here is the link for the stats (training and validation stats for both clean and legacy_pytorch modes): https://drive.google.com/drive/folders/1q7b-hqc-xUUGi9fGzfI1gVlJYk2Jji5h?usp=sharing

For legacy_tensorflow, it did not work, it raised an exception: return torch.stack(batch, 0, out=out) RuntimeError: stack expects each tensor to be equal size, but got [3, 250, 250] at entry 0 and [3, 150, 200] at entry 1

Here are the steps to reproduce so that you can compare with the above stats if you would like:

  1. Download ILSVRC2012_img_train.tar and ILSVRC2012_img_valid.tar from https://image-net.org/download.php
  2. Extract training: tar xvf ILSVRC2012_img_train.tar -C train, which itself contains tars. Inside train: for v in *.tar;do tar xvf $v;done
  3. Extract validation: tar xvf ILSVRC2012_img_train.tar -C valid
  4. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="clean", num_workers=8, batch_size=128)'
  5. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_train", "train", mode="legacy_pytorch", num_workers=8, batch_size=128)'
  6. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="clean", num_workers=8, batch_size=128)'
  7. python -c 'from cleanfid import fid;fid.make_custom_stats("imagenet1k_valid", "valid", mode="legacy_pytorch", num_workers=8, batch_size=128)'

For training there are 1281167 images, and for valid 50000 images.

software stack:

torch==1.8.1+cu111
torchvision==0.9.1+cu111
numpy==1.19.0
scipy==1.6.3
pillow=8.2.0
requests==2.25.1
clean-fid==0.1.13

Also:

CUDA: 11.1.1
cuDNN: 8.0.4.30

Hi Mehdi, do you remember what image size did you use to rescale? 256x256 or 512x512? The reason why I ask this is because of this.