Open liudakai2 opened 1 year ago
Thank you for your interest in our work and sorry for the late reply.
Did you use the stage 1 checkpoint? Stage 1 checkpoint is for anomaly detection only (AUROC, AUPR, FPR). Stage 2 checkpoint is for in-distribution classification (ACC).
Hi , stage2 model provided in the Google drive https://drive.google.com/drive/folders/1Z2VkeY8e6XIyEu995bSJBV628MXBH5ZZ?usp=sharing gives key Error when used in test.py , could you please help?
Hello, Thanks for your great work. I have the same question about re-producing the results of the 77.08 ACC in CIFAR10-LT.
I used the stage 2 checkpoint provided in the Google Drive and ran the same test command written in the README.md, but i got ACC : 0.7417. How can we achieve the same results reported in the paper's results?
Hello, Thanks for your great work. I have the same question about re-producing the results of the 77.08 ACC in CIFAR10-LT.
I used the stage 2 checkpoint provided in the Google Drive and ran the same test command written in the README.md, but i got ACC : 0.7417. How can we achieve the same results reported in the paper's results?
Hi, do you have achieved the same results of the 77.08 ACC in CIFAR10-LT reported in the paper? Thanks a lot!
Dear all,
@FengShuai-bupt @HDNU @zzuy @liudakai2
Sorry for the late reply. I have quite a few things piled up over the last few months and now I finally got some time to tackle this issue. :)
The stage 2 CIFAR10 model I uploaded should be tested using an older version of test codes, which I have provided as the val_cifar_only_id
function in test.py
. Pull the latest repo and use the following command to get the ~77.08 ACC in CIFAR10-LT reported in the paper. (The uploaded model has 77.63 accuracy. The reported 77.08 is averaged over serveral runs.)
for dout in texture svhn cifar tin lsun places365
do
python test.py --gpu 0 --ds cifar10 --dout $dout \
--drp <where_you_store_all_your_datasets> \
--ckpt_path <where_you_save_the_ckpt> \
--only_id
done
The old version version of codes stored the main branch (i.e., the anomaly detection branch in stage 1) and the auxiliary branch (i.e., the ID classification branch in stage 2) separately.
The new version of codes updated the BN and FC layers in place at the second stage.
So for the new version model ckpt, the same test script can be used for stage 1 and stage 2 models.
But for the old version model ckpt (i.e., the CIFAR10 model I uploaded), we should use slightly different test codes for stage 1 and 2. Specifically, we use forward
function to do inference in stage 1 and forward_aux_classifier
in stage 2.
So the new command I provide here is only meant to be used for the CIFAR10 stage 2 model I uploaded to the Google drive. The models obtained using the training script, or the CIFAR100 and ImageNet model I uploaded shouldn't have this issue, since they are trained using the latest code version.
Please let me know if you have any further questions.
Thank you again for your interest in this work!
Haotao
Sorry I don't have the authority to push to this repo anymore. I have made a pull request to reflect all necessary code updates. Will let my manager know to merge it tomorrow. :)
@HDNU @FengShuai-bupt @liudakai2 @zzuy
The update has been pushed. Thanks!
@HDNU @FengShuai-bupt @liudakai2 @zzuy
The update has been pushed. Thanks!
@HDNU @FengShuai-bupt @liudakai2 @zzuy
The update has been pushed. Thanks!
Hi, my understanding is that the following commands are used when OOD detection and ID classification are implemented respectively:
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path <where_you_save_the_ckpt> \ done
The only difference lies in the ckpt_path used, right?
@FengShuai-bupt
For me, the ckpts of the 1st and 2nd stages are saved as cifar10/stage1/latest.pth
and cifar10/stage1/stage2/latest.pth
, respectively.
To test stage 1 results (i.e., the OOD detection performance), use
for dout in texture svhn cifar tin lsun places365
do
python test.py --gpu 0 --ds cifar10 --dout $dout \
--drp <where_you_store_all_your_datasets> \
--ckpt_path cifar10/stage1/latest.pth
done
To test stage 2 results (i.e., the ID classification performance), use
for dout in texture svhn cifar tin lsun places365
do
python test.py --gpu 0 --ds cifar10 --dout $dout \
--drp <where_you_store_all_your_datasets> \
--ckpt_path cifar10/stage1/stage2/latest.pth
done
, or
for dout in texture svhn cifar tin lsun places365
do
python test.py --gpu 0 --ds cifar10 --dout $dout \
--drp <where_you_store_all_your_datasets> \
--ckpt_path cifar10/stage1/stage2/latest.pth \
--only_id
done
if and only if you use the CIFAR10 ckpt I uploaded to Google drive. The --only_id
argument is only used in this special case for compatibility with the older version model.
I know this older version CIFAR10 ckpt is causing some confusion. I may replace it with the new version when I get time. :) But you can also run the training codes yourself on CIFAR10 to get the new version ckpt.
@FengShuai-bupt
For me, the ckpts of the 1st and 2nd stages are saved as
cifar10/stage1/latest.pth
andcifar10/stage1/stage2/latest.pth
, respectively.To test stage 1 results (i.e., the OOD detection performance), use
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/latest.pth done
To test stage 2 results (i.e., the ID classification performance), use
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/stage2/latest.pth done
, or
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/stage2/latest.pth \ --only_id done
if and only if you use the CIFAR10 ckpt I uploaded to Google drive. The
--only_id
argument is only used in this special case for compatibility with the older version model.I know this older version CIFAR10 ckpt is causing some confusion. I may replace it with the new version when I get time. :) But you can also run the training codes yourself on CIFAR10 to get the new version ckpt.
Thanks ! I have run the training codes myself on CIFAR10-LT to get the new version ckpt. However, for example, on texture, the testing results of stage 1 (i.e., the OOD detection performance) is:
==texture=== ACC: test_acc:0.7634 (many_acc:0.9290, median_acc:0.7312, low_acc:0.6407) auroc: 0.9249, aupr: 0.8212, fpr95: 0.2440 ACC@FPR0.0: acc_at_fpr_level:0.7634 (many_ac:0.9290, median_acc:0.7312, low_acc:0.6407 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.001: acc_at_fpr_level:0.7639 (many_ac:0.9293, median_acc:0.7313, low_acc:0.6416 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.01: acc_at_fpr_level:0.7691 (many_ac:0.9322, median_acc:0.7355, low_acc:0.6487 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.1: acc_at_fpr_level:0.8079 (many_ac:0.9471, median_acc:0.7783, low_acc:0.6879 | head_acc:0.8634, tail_acc:0.6634)
The testing results of stage 2 (i.e., the ID classification performance) is:
===texture=== ACC@FPR0.0: overall_acc:0.7381 (many_acc:0.9283, median_acc:0.7195, low_acc:0.5727 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.001: overall_acc:0.7384 (many_acc:0.9287, median_acc:0.7194, low_acc:0.5732 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.01: overall_acc:0.7423 (many_acc:0.9316, median_acc:0.7218, low_acc:0.5788 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.1: overall_acc:0.7914 (many_acc:0.9470, median_acc:0.7637, low_acc:0.6536 | head_acc:0.8578, tail_acc:0.6184)
I don't know why the ID classification performance of stage 2 is lower than the results of stage 1. Any suggestion?
@FengShuai-bupt For me, the ckpts of the 1st and 2nd stages are saved as
cifar10/stage1/latest.pth
andcifar10/stage1/stage2/latest.pth
, respectively. To test stage 1 results (i.e., the OOD detection performance), usefor dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/latest.pth done
To test stage 2 results (i.e., the ID classification performance), use
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/stage2/latest.pth done
, or
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/stage2/latest.pth \ --only_id done
if and only if you use the CIFAR10 ckpt I uploaded to Google drive. The
--only_id
argument is only used in this special case for compatibility with the older version model. I know this older version CIFAR10 ckpt is causing some confusion. I may replace it with the new version when I get time. :) But you can also run the training codes yourself on CIFAR10 to get the new version ckpt.Thanks ! I have run the training codes myself on CIFAR10-LT to get the new version ckpt. However, for example, on texture, the testing results of stage 1 (i.e., the OOD detection performance) is:
==texture=== ACC: test_acc:0.7634 (many_acc:0.9290, median_acc:0.7312, low_acc:0.6407) auroc: 0.9249, aupr: 0.8212, fpr95: 0.2440 ACC@FPR0.0: acc_at_fpr_level:0.7634 (many_ac:0.9290, median_acc:0.7312, low_acc:0.6407 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.001: acc_at_fpr_level:0.7639 (many_ac:0.9293, median_acc:0.7313, low_acc:0.6416 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.01: acc_at_fpr_level:0.7691 (many_ac:0.9322, median_acc:0.7355, low_acc:0.6487 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.1: acc_at_fpr_level:0.8079 (many_ac:0.9471, median_acc:0.7783, low_acc:0.6879 | head_acc:0.8634, tail_acc:0.6634)
The testing results of stage 2 (i.e., the ID classification performance) is:
===texture=== ACC@FPR0.0: overall_acc:0.7381 (many_acc:0.9283, median_acc:0.7195, low_acc:0.5727 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.001: overall_acc:0.7384 (many_acc:0.9287, median_acc:0.7194, low_acc:0.5732 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.01: overall_acc:0.7423 (many_acc:0.9316, median_acc:0.7218, low_acc:0.5788 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.1: overall_acc:0.7914 (many_acc:0.9470, median_acc:0.7637, low_acc:0.6536 | head_acc:0.8578, tail_acc:0.6184)
I don't know why the ID classification performance of stage 2 is lower than the results of stage 1. Any suggestion?
Can you share me the command you used for the 2nd stage training and testing? I'll look into it. In the meantime, you can try the old version model if you just want to get quick testing on the original model.
@FengShuai-bupt For me, the ckpts of the 1st and 2nd stages are saved as
cifar10/stage1/latest.pth
andcifar10/stage1/stage2/latest.pth
, respectively. To test stage 1 results (i.e., the OOD detection performance), usefor dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/latest.pth done
To test stage 2 results (i.e., the ID classification performance), use
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/stage2/latest.pth done
, or
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp <where_you_store_all_your_datasets> \ --ckpt_path cifar10/stage1/stage2/latest.pth \ --only_id done
if and only if you use the CIFAR10 ckpt I uploaded to Google drive. The
--only_id
argument is only used in this special case for compatibility with the older version model. I know this older version CIFAR10 ckpt is causing some confusion. I may replace it with the new version when I get time. :) But you can also run the training codes yourself on CIFAR10 to get the new version ckpt.Thanks ! I have run the training codes myself on CIFAR10-LT to get the new version ckpt. However, for example, on texture, the testing results of stage 1 (i.e., the OOD detection performance) is:
==texture=== ACC: test_acc:0.7634 (many_acc:0.9290, median_acc:0.7312, low_acc:0.6407) auroc: 0.9249, aupr: 0.8212, fpr95: 0.2440 ACC@FPR0.0: acc_at_fpr_level:0.7634 (many_ac:0.9290, median_acc:0.7312, low_acc:0.6407 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.001: acc_at_fpr_level:0.7639 (many_ac:0.9293, median_acc:0.7313, low_acc:0.6416 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.01: acc_at_fpr_level:0.7691 (many_ac:0.9322, median_acc:0.7355, low_acc:0.6487 | head_acc:0.8634, tail_acc:0.6634) ACC@FPR0.1: acc_at_fpr_level:0.8079 (many_ac:0.9471, median_acc:0.7783, low_acc:0.6879 | head_acc:0.8634, tail_acc:0.6634)
The testing results of stage 2 (i.e., the ID classification performance) is:
===texture=== ACC@FPR0.0: overall_acc:0.7381 (many_acc:0.9283, median_acc:0.7195, low_acc:0.5727 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.001: overall_acc:0.7384 (many_acc:0.9287, median_acc:0.7194, low_acc:0.5732 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.01: overall_acc:0.7423 (many_acc:0.9316, median_acc:0.7218, low_acc:0.5788 | head_acc:0.8578, tail_acc:0.6184) ACC@FPR0.1: overall_acc:0.7914 (many_acc:0.9470, median_acc:0.7637, low_acc:0.6536 | head_acc:0.8578, tail_acc:0.6184)
I don't know why the ID classification performance of stage 2 is lower than the results of stage 1. Any suggestion?
Can you share me the command you used for the 2nd stage training and testing? I'll look into it. In the meantime, you can try the old version model if you just want to get quick testing on the original model.
Thanks! The stage 2 training:
python stage2.py --gpu 0 --ds cifar10 \ --drp
\ --pretrained_exp_str
The stage 2 testing:
for dout in texture svhn cifar tin lsun places365 do python test.py --gpu 0 --ds cifar10 --dout $dout \ --drp
\ --ckpt_path <Results/LT_OOD_results/PASCL/cifar10-0.01-OOD30000/ResNet18/e200-b256-adam-lr0.001-wd0.0005-cos_Lambda0.5-Lambda20.1-T0.07-sign-k0.4-trial2/finetune_both_e10-b128-adam-lr0.0005-wd0.0005-cos_LA-tau1/> \ --only_id done
Moreover, I have run the training codes myself on CIFAR10-LT to get several new version ckpt_path, however, the ID classification performance of stage 2 is still lower than the results of stage 1. I don't know why this happens. Could you give me some advice? Thank you very much!
Hi , I am still getting keyError when loading stage2.pth with below cmd. Do you know why?
for dout in texture svhn cifar tin lsun places365
do
python test.py --gpu 0 --ds cifar10 --dout $dout \
--drp
Hi, thanks to your great work. But I have recently failed to re-produce the results in your paper (e.g., the 77.08 ACC of CIFAR10-LT) by the provided test command and pretrained model. I got the results of:
ACC: 0.7417 (0.9420, 0.7278, 0.5600)
Any suggestion?