deeplearning-wisc / poem

PyTorch implementation of POEM (Out-of-distribution detection with posterior sampling), ICML 2022
28 stars 2 forks source link

Can not reproduce the results after re-training the network for 100 epoches. #2

Closed JimZAI closed 1 year ago

JimZAI commented 1 year ago

Hi, thanks for your insightful work.

When i tried to reproduce the results in Table 1, by orderly running the following scripts:

  1. python train_poem.py --name POEM --in-dataset CIFAR-10 --auxiliary-dataset imagenet --epochs 100 --model-arch densenet
  2. python get_score.py --name POEM --in-dataset CIFAR-10 --model-arch densenet --test_epochs 100
  3. python get_results.py --name POEM --in-dataset CIFAR-10 --test_epochs 100

I obtained the following results on the six datasets (for two tries), which are obviously below the reported ones (2.54±0.56, 99.40±0.05, 99.50±0.07). places365 Energy Sum at epoch 100 FPR95: 4.10 AUROC: 98.74 AUPR: 98.72 LSUN Energy Sum at epoch 100 FPR95: 22.56 AUROC: 96.27 AUPR: 96.96 LSUNresize Energy Sum at epoch 100 FPR95: 0.00 AUROC: 100.00 AUPR: 100.00 iSUN Energy Sum at epoch 100 FPR95: 0.00 AUROC: 100.00 AUPR: 100.00 dtd Energy Sum at epoch 100 FPR95: 0.30 AUROC: 99.74 AUPR: 99.86 SVHN_ Energy Sum at epoch 100 FPR95: 2.95 AUROC: 99.12 AUPR: 99.28

Avg FPR95: 4.99 Avg AUROC: 0.9898 Avg AUPR: 0.9914

Is there something wrong for my tries? Many thinks.

alvinmingwisc commented 1 year ago

Hi! Thanks for your interest in our work. Note that the default hyperparameters are not optimal for CIFAR-10. For each ID dataset, posterior sampling hyperparameters need to be tuned, especially the ones controlling the variances for noises and weights: sigma and sigma_n. To reproduce the results in the paper on CIFAR-10, please download the checkpoints from https://www.dropbox.com/home/checkpoints_POEM/CIFAR-10. Here are the results I just ran with the script. Hope it helps!

Screen Shot 2022-12-28 at 10 29 59 PM
JimZAI commented 1 year ago

So, could you provide the tuned hyper-parameters (e.g. sigma and sigma_n) for CIFAR-10 and CIFAR-100?

alvinmingwisc commented 1 year ago

So, could you provide the tuned hyper-parameters (e.g. sigma and sigma_n) for CIFAR-10 and CIFAR-100?

The hyperparamers were tuned with an extensive grid search. As the project was completed a long time ago, we decided to upload the checkpoints to the cloud and cleaned up other stuff. I'm happy to help tuning and provide the optimal ones when the computing resources are more available.

JimZAI commented 1 year ago

Thanks a lot. I will try to search the optimal ones, and wait for your responce. In addition, this issue will be closed once i successfully reproduce the results.

alvinmingwisc commented 1 year ago

Thanks a lot. I will try to search the optimal ones, and wait for your responce. In addition, this issue will be closed once i successfully reproduce the results.

You are welcome. Without tuning, I just ran with default hyperparameters except setting 'conf' to 3.0 as specified in the paper. Here are the results I obtained, which closely match the reported ones. Codebase is updated. If you still cannot reproduce the results with the latest version, please let me know. I'm also happy to help with implementation issues.

Screen Shot 2022-12-29 at 11 04 52 AM
JimZAI commented 1 year ago

Thanks a lot, i'll try it.

lygjwy commented 1 year ago

any idea about the hyper-parameters (e.g. sigma and sigma_n) for CIFAR-100 benchmark with ImageNet64 random crop dataset as auxiliary OOD training dataset? If I change the auxiliary OOD training dataset, is it necessary to tune the hyper-parameters again? Thanks.