Lingkai-Kong / SDE-Net

Code for paper: SDE-Net: Equipping Deep Neural network with Uncertainty Estimates
Apache License 2.0
107 stars 18 forks source link

Reproduction of the results #7

Closed philipperemy closed 4 years ago

philipperemy commented 4 years ago

@Lingkai-Kong

I ran the python commands on the repo and could not find the results you quoted in the paper.

Of course it was just one run but the values seem to be too low (regarding the std deviation).

MNIST RESNET
_________________________________

Final Accuracy: 9945/10000 (99.45%)

generate log  from out-of-distribution data
calculate metrics for OOD
OOD  Performance of Baseline detector
TNR at TPR 95%:            88.783%
AUROC:                     95.939%
Detection acc:             92.169%
AUPR In:                   86.441%
AUPR Out:                  98.434%

calculate metrics for mis
mis  Performance of Baseline detector
TNR at TPR 95%:            89.791%
AUROC:                     97.510%
Detection acc:             93.041%
AUPR In:                   99.985%
AUPR Out:                  34.000%

MNIST SDENET
_________________________________

Final Accuracy: 9927/10000 (99.27%)

generate log  from out-of-distribution data
calculate metrics for OOD
OOD  Performance of Baseline detector
TNR at TPR 95%:            99.372%
AUROC:                     99.804%
Detection acc:             98.692%
AUPR In:                   99.483%
AUPR Out:                  99.887%
calculate metrics for mis
mis  Performance of Baseline detector
TNR at TPR 95%:            92.544%
AUROC:                     97.525%
Detection acc:             94.485%
AUPR In:                   99.979%
AUPR Out:                  41.739%

SVHN RESNET
_________________________________

Final Accuracy: 24609/25856 (95.18%)

generate log  from out-of-distribution data
calculate metrics for OOD
OOD  Performance of Baseline detector
TNR at TPR 95%:            66.552%
AUROC:                     94.421%
Detection acc:             90.136%
AUPR In:                   97.639%
AUPR Out:                  84.998%
calculate metrics for mis
mis  Performance of Baseline detector
TNR at TPR 95%:            64.376%
AUROC:                     90.458%
Detection acc:             85.371%
AUPR In:                   99.301%
AUPR Out:                  44.899%

SVHN SDENET
_________________________________

Final Accuracy: 24588/25856 (95.10%)

generate log  from out-of-distribution data
calculate metrics for OOD
OOD  Performance of Baseline detector
TNR at TPR 95%:            65.215%
AUROC:                     94.308%
Detection acc:             89.746%
AUPR In:                   97.694%
AUPR Out:                  84.017%
calculate metrics for mis
mis  Performance of Baseline detector
TNR at TPR 95%:            67.831%
AUROC:                     91.267%
Detection acc:             86.501%
AUPR In:                   99.270%
AUPR Out:                  48.871%

image

philipperemy commented 4 years ago

For example, for SVHN SDENET / OOD CIFAR, we get AUPR Out: 84.017% but paper is 93.7±0.9.

philipperemy commented 4 years ago

I understand that it cannot be 93.7 exactly since it's an average but 84% seems quite low (same as the other methods).

Model is save_sdenet_svhn

Lingkai-Kong commented 4 years ago

The performance should be slightly better than the table using the parameters in the repo. It is strange that your OOD is so low for SVHN. I change the parameter of SVHN datasets to the original values. Let me know it still doesn't work for you on SVHN.

philipperemy commented 4 years ago

ok it looks better now I guess. we have 91.7% and the paper quotes 93.7±0.9.

Load model
load target data:  svhn
Building SVHN data loader with 1 workers
Using downloaded and verified file: ../data/svhn/train_32x32.mat
Using downloaded and verified file: ../data/svhn/test_32x32.mat
load non target data:  cifar10
Building CIFAR-10 data loader with 1 workers
Files already downloaded and verified
Files already downloaded and verified
generate log from in-distribution data

 Final Accuracy: 24345/25856 (94.16%)

generate log  from out-of-distribution data
calculate metrics for OOD
OOD  Performance of Baseline detector
TNR at TPR 95%:            83.940%
AUROC:                     97.134%
Detection acc:             92.115%
AUPR In:                   98.848%
AUPR Out:                  91.691%
calculate metrics for mis
mis  Performance of Baseline detector
TNR at TPR 95%:            66.844%
AUROC:                     92.667%
Detection acc:             87.156%
AUPR In:                   99.357%
AUPR Out:                  54.045%
Lingkai-Kong commented 4 years ago

Seems work now. It should be just the variance of the results. I just ran the model five times again. Below are my results:


run1:

Final Accuracy: 24413/25856 (94.42%)

generate log from out-of-distribution data calculate metrics for OOD OOD Performance of Baseline detector TNR at TPR 95%: 83.466% AUROC: 97.212% Detection acc: 92.173% AUPR In: 98.933% AUPR Out: 91.916% calculate metrics for mis mis Performance of Baseline detector TNR at TPR 95%: 67.344% AUROC: 92.087% Detection acc: 86.855% AUPR In: 99.336% AUPR Out: 52.726%


run2:

Final Accuracy: 24273/25856 (93.88%)

generate log from out-of-distribution data calculate metrics for OOD OOD Performance of Baseline detector TNR at TPR 95%: 87.640% AUROC: 97.761% Detection acc: 92.734% AUPR In: 99.153% AUPR Out: 93.575% calculate metrics for mis mis Performance of Baseline detector TNR at TPR 95%: 65.276% AUROC: 92.183% Detection acc: 86.330% AUPR In: 99.325% AUPR Out: 54.205%


run3: Final Accuracy: 24375/25856 (94.27%)

generate log from out-of-distribution data calculate metrics for OOD OOD Performance of Baseline detector TNR at TPR 95%: 92.325% AUROC: 98.500% Detection acc: 94.053% AUPR In: 99.410% AUPR Out: 95.651% calculate metrics for mis mis Performance of Baseline detector TNR at TPR 95%: 65.903% AUROC: 91.588% Detection acc: 85.871% AUPR In: 99.268% AUPR Out: 53.556%


run4:

Final Accuracy: 24416/25856 (94.43%)

generate log from out-of-distribution data calculate metrics for OOD OOD Performance of Baseline detector TNR at TPR 95%: 85.298% AUROC: 97.520% Detection acc: 92.393% AUPR In: 99.018% AUPR Out: 93.336% calculate metrics for mis mis Performance of Baseline detector TNR at TPR 95%: 66.203% AUROC: 91.960% Detection acc: 86.838% AUPR In: 99.314% AUPR Out: 52.331%


run5: Final Accuracy: 24345/25856 (94.16%)

generate log from out-of-distribution data calculate metrics for OOD OOD Performance of Baseline detector TNR at TPR 95%: 90.076% AUROC: 98.126% Detection acc: 93.207% AUPR In: 99.264% AUPR Out: 94.566% calculate metrics for mis mis Performance of Baseline detector TNR at TPR 95%: 66.815% AUROC: 92.440% Detection acc: 86.975% AUPR In: 99.357% AUPR Out: 54.796%


Average: Accuracy: 94.232 +- 0.226 TNR at TPR 95%: 87.761 +- 3.561 AUROC: 97.824 +- 0.505 Detection acc: 92.920 +- 0.738 AUPR In: 99.156 +- 0.190 AUPR Out: 93.908 +- 1.399

As you can see, not every single run can fall in the average +- std.

philipperemy commented 4 years ago

Yes looks good now thank you! @Lingkai-Kong