eccv22-ood-workshop / ROBIN-dataset

ECCV 2022 Workshop: A Benchmark for Robustness to Individual Nuisances in Real-World Out-of-Distribution Shifts
http://www.ood-cv.org/
46 stars 6 forks source link

How to judge the "significantly different in IID performance with our baseline"? #13

Open FrankZhangRp opened 2 years ago

FrankZhangRp commented 2 years ago

Hi, Is the IID performance of the released dataset the same as the paper? image There are two different backbone networks in the paper, are the backbone networks fixed in this challenge? And how to judge the "significantly different in IID performance with our baseline"? Thanks a lot!

wjm-wjm commented 2 years ago

Hi, I find that the Top-1 performance of image classification approach to 100% after training on both iid_test and nuisances validation sets. Have you got this problem ?

FrankZhangRp commented 2 years ago

I think the training is only on the train/Images and both iid_test and nuisances are just used for testing. And I am also confused of the model selection strategy, I think the model can only be selected by the iid_test set to evaluate the algorithms' OOD performance . Here are the baseline results of our implementation.

image
wjm-wjm commented 2 years ago

I have run following command: "python main.py --data /home/wjm/wjm/dataset/ROBINv1.0/train --val-data /home /wjm/wjm/dataset/ROBINv1.0/nuisances --pretrained" EFQUZ2WR%J~ZSQD@YGWWG8L The performance seems to approach 100% after one epoch training, is something wrong happening?Thanks for your kind help!

DTennant commented 2 years ago

I think the training is only on the train/Images and both iid_test and nuisances are just used for testing. And I am also confused of the model selection strategy, I think the model can only be selected by the iid_test set to evaluate the algorithms' OOD performance . Here are the baseline results of our implementation. image

Hi, The detail of the model selection is still not confirmed by our sponsor, but we are planning to implement it as this: we have an updated mean and variance of the top-1 accuracy on our dataset, and for each team participating in our challenge, the final performance should be within the range of mean +- 3 * variance, and our baseline will be the ResNet50 results in the paper.

I have run following command: "python main.py --data /home/wjm/wjm/dataset/ROBINv1.0/train --val-data /home /wjm/wjm/dataset/ROBINv1.0/nuisances --pretrained" EFQUZ2WR%J~ZSQD@YGWWG8L The performance seems to approach 100% after one epoch training, is something wrong happening?Thanks for your kind help!

Hi, the data processing script and the baselines for the three tasks will be released after we have confirmation from our sponsor, thanks for you interest.

DTennant commented 2 years ago

Hi, I have updated the dataset and uploaded the evaluation code that we will be using for the codalab server, you can check the code for your questions.