KaiyangZhou / ssdg-benchmark

Benchmarks for semi-supervised domain generalization.
MIT License
67 stars 9 forks source link

Not able to reproduce numbers for PACS dataset. #10

Open Griffintaur opened 2 years ago

Griffintaur commented 2 years ago

I tried running your code and couldn't able to reproduce the numbers for the PACS dataset. The results I obtained for 5 samples per class and the numbers reported in the paper are as follows.

Art_Painting  77.18   78.54(reported)
Cartoon  73.74  74.44(reported)
Photo   89.35    89.25(reported)
Sketch  76.5   79.06(reported)
Avg    79.1925 80.32 (reported)

I can understand the 1% percentage point difference in the first three domains but

  1. The numbers reported for a sketch are 3 percentage points higher than what I obtained
  2. The number quoted for 5 samples per class is greater than the numbers obtained in the paper for 10 samples per class. I am a bit confused about this.

Can you please clarify this?

KaiyangZhou commented 2 years ago

In my case, the results of different seeds don't deviate too much. Please see below

Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed1/log.txt. acc: 78.37%. err: 21.63%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed2/log.txt. acc: 80.66%. err: 19.34%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed3/log.txt. acc: 79.00%. err: 21.00%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed4/log.txt. acc: 76.46%. err: 23.54%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting/seed5/log.txt. acc: 78.22%. err: 21.78%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/art_painting
* acc: 78.54% +- 1.35%
===
Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed1/log.txt. acc: 70.35%. err: 29.65%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed2/log.txt. acc: 77.65%. err: 22.35%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed3/log.txt. acc: 72.70%. err: 27.30%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed4/log.txt. acc: 77.01%. err: 22.99%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon/seed5/log.txt. acc: 74.49%. err: 25.51%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/cartoon
* acc: 74.44% +- 2.71%
===
Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed1/log.txt. acc: 89.46%. err: 10.54%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed2/log.txt. acc: 84.91%. err: 15.09%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed3/log.txt. acc: 91.14%. err: 8.86%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed4/log.txt. acc: 91.56%. err: 8.44%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo/seed5/log.txt. acc: 89.16%. err: 10.84%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/photo
* acc: 89.25% +- 2.36%
===
Parsing output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed1/log.txt. acc: 77.06%. err: 22.94%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed2/log.txt. acc: 81.64%. err: 18.36%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed3/log.txt. acc: 76.93%. err: 23.07%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed4/log.txt. acc: 79.40%. err: 20.60%
file: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch/seed5/log.txt. acc: 80.27%. err: 19.73%
===
outcome of directory: output/ssdg_pacs/nlab_105/Ours/resnet18/v1/sketch
* acc: 79.06% +- 1.83%
===
overall average
* acc: 80.32%

Just double check: did you run the experiments 5 times each as programmed? did you use the data provided by this code?

Griffintaur commented 2 years ago

Yes, I used exact same settings and data provided by you to execute the experiments.

KaiyangZhou commented 2 years ago

Hmm, it's strange that you got such a huge deviation on the sketch domain. I couldn't explain this. But from my experience, the PACS dataset and its protocol sometimes also leads to a big deviation in results, which are also observed by my colleagues. Anyway, let me know if you figure out the cause.

Griffintaur commented 2 years ago

Can you let me the development environment where you ran those experiment because at least the numbers seem too high for the sketch domain or if possible, you can try to rerun sketch domain to see if they are reproducible

KaiyangZhou commented 2 years ago

FYI

Collecting env info ...
** System info **
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
Clang version: Could not collect
CMake version: version 2.8.12.2

Python version: 3.7 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB
GPU 2: Tesla V100-PCIE-32GB
GPU 3: Tesla V100-PCIE-32GB
GPU 4: Tesla V100-PCIE-32GB
GPU 5: Tesla V100-PCIE-32GB
GPU 6: Tesla V100-PCIE-32GB
GPU 7: Tesla V100-PCIE-32GB

Nvidia driver version: 418.67
cuDNN version: /usr/local/cuda-9.0/lib64/libcudnn.so.7.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.7.1
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h6bb024c_0
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py37he8ac12f_0
[conda] mkl_fft                   1.3.0            py37h54f3939_0
[conda] mkl_random                1.1.1            py37h0573a6f_0
[conda] numpy                     1.20.1                   pypi_0    pypi
[conda] numpy-base                1.19.2           py37hfa32c7d_0
[conda] pytorch                   1.7.1           py3.7_cuda10.1.243_cudnn7.6.3_0    pytorch
[conda] torchvision               0.8.2                py37_cu101    pytorch
        Pillow (8.1.0)