deeplearning-wisc / MCM

PyTorch implementation of MCM (Delving into out-of-distribution detection with vision-language representations), NeurIPS 2022
69 stars 8 forks source link

Details about configuration of Waterbirds-Spurious OOD #7

Closed debby1103 closed 1 year ago

debby1103 commented 1 year ago

Is there any configuration details or splitted datasets? Thanks!

debby1103 commented 1 year ago

The train set of Waterbirds? or validation set?

alvinmingsf commented 1 year ago

Hi! The Waterbirds dataset (https://github.com/kohpangwei/group_DRO) is constructed by cropping out birds from photos in the Caltech-UCSD Birds-200-2011 (CUB) dataset (Wah et al., 2011) and transferring them onto backgrounds from the Places dataset (Zhou et al., 2017). For evaluation, the ID val set of Waterbirds is used. Background spurious OOD test set (which contain spuriously correlated background images) can be downloaded here: https://drive.google.com/file/d/1CBe9f8yHIlQnXYmNQj45DsqB5vCT6qQ6/view?usp=share_link

debby1103 commented 1 year ago

Thanks for your help! Got the Waterbirds dataset (waterbird_complete95_forest2water2) and the OOD dataset. I found 1.2k validation samples(split=1) and use these images as ID samples, the OOD samples are all 10k images from Spurious OOD. I adopt 200 bird categories as ID classes. And the FPR95 score for MCM is much larger than expected (33.67 vs. 5.87), is there anything wrong with my experiment setup? Thanks again!

alvinmingsf commented 1 year ago

Hi! I just tested CUB (ID) vs. Placesbg (spurious OOD) and here are the results I get:

FPR95 AUROC AUPR
5.90 & 98.38 & 96.04

If I decrease the temperature (T) from 1 to 0.01. The results are significantly worse:

FPR95 AUROC AUPR
41.89 & 87.80 & 78.99

What is the T you used?

debby1103 commented 1 year ago

Oops I got you. I replicated the result of CUB (ID) vs. Placesbg (spurious OOD) successfully, as is 5.71 FPR95 with T=1 and 41.89 FPR95 with T=0.01. Grateful for your timely help!

lpg0502 commented 7 months ago

Hello, I'm also encountering difficulties in reproducing spurious OOD. First, I'd like to ask about the mentioned CUB(ID) vs placebg(Spurious OOD). Does CUB refer to the validation set of waterbird_complete95_forest2water2, or the original CUB_200_2011 dataset? When I use CUB_200_2011 as the ID, I obtain results at 7.78/98.26, but when I use waterbird_complete95_forest2water2, the overall results significantly drop. I'm not sure if there's a mistake in the setup. Thank you!