Closed JimZAI closed 1 year ago
Hi! Thanks for your interest in our work. Note that the default hyperparameters are not optimal for CIFAR-10. For each ID dataset, posterior sampling hyperparameters need to be tuned, especially the ones controlling the variances for noises and weights: sigma
and sigma_n
. To reproduce the results in the paper on CIFAR-10, please download the checkpoints from https://www.dropbox.com/home/checkpoints_POEM/CIFAR-10
. Here are the results I just ran with the script. Hope it helps!
So, could you provide the tuned hyper-parameters (e.g. sigma and sigma_n) for CIFAR-10 and CIFAR-100?
So, could you provide the tuned hyper-parameters (e.g. sigma and sigma_n) for CIFAR-10 and CIFAR-100?
The hyperparamers were tuned with an extensive grid search. As the project was completed a long time ago, we decided to upload the checkpoints to the cloud and cleaned up other stuff. I'm happy to help tuning and provide the optimal ones when the computing resources are more available.
Thanks a lot. I will try to search the optimal ones, and wait for your responce. In addition, this issue will be closed once i successfully reproduce the results.
Thanks a lot. I will try to search the optimal ones, and wait for your responce. In addition, this issue will be closed once i successfully reproduce the results.
You are welcome. Without tuning, I just ran with default hyperparameters except setting 'conf' to 3.0 as specified in the paper. Here are the results I obtained, which closely match the reported ones. Codebase is updated. If you still cannot reproduce the results with the latest version, please let me know. I'm also happy to help with implementation issues.
Thanks a lot, i'll try it.
any idea about the hyper-parameters (e.g. sigma and sigma_n) for CIFAR-100 benchmark with ImageNet64 random crop dataset as auxiliary OOD training dataset? If I change the auxiliary OOD training dataset, is it necessary to tune the hyper-parameters again? Thanks.
Hi, thanks for your insightful work.
When i tried to reproduce the results in Table 1, by orderly running the following scripts:
I obtained the following results on the six datasets (for two tries), which are obviously below the reported ones (2.54±0.56, 99.40±0.05, 99.50±0.07). places365 Energy Sum at epoch 100 FPR95: 4.10 AUROC: 98.74 AUPR: 98.72 LSUN Energy Sum at epoch 100 FPR95: 22.56 AUROC: 96.27 AUPR: 96.96 LSUNresize Energy Sum at epoch 100 FPR95: 0.00 AUROC: 100.00 AUPR: 100.00 iSUN Energy Sum at epoch 100 FPR95: 0.00 AUROC: 100.00 AUPR: 100.00 dtd Energy Sum at epoch 100 FPR95: 0.30 AUROC: 99.74 AUPR: 99.86 SVHN_ Energy Sum at epoch 100 FPR95: 2.95 AUROC: 99.12 AUPR: 99.28
Avg FPR95: 4.99 Avg AUROC: 0.9898 Avg AUPR: 0.9914
Is there something wrong for my tries? Many thinks.