Open FrankZhangRp opened 2 years ago
Hi, I find that the Top-1 performance of image classification approach to 100% after training on both iid_test and nuisances validation sets. Have you got this problem ?
I think the training is only on the train/Images and both iid_test and nuisances are just used for testing. And I am also confused of the model selection strategy, I think the model can only be selected by the iid_test set to evaluate the algorithms' OOD performance . Here are the baseline results of our implementation.
I have run following command: "python main.py --data /home/wjm/wjm/dataset/ROBINv1.0/train --val-data /home /wjm/wjm/dataset/ROBINv1.0/nuisances --pretrained" The performance seems to approach 100% after one epoch training, is something wrong happening?Thanks for your kind help!
I think the training is only on the train/Images and both iid_test and nuisances are just used for testing. And I am also confused of the model selection strategy, I think the model can only be selected by the iid_test set to evaluate the algorithms' OOD performance . Here are the baseline results of our implementation.
Hi, The detail of the model selection is still not confirmed by our sponsor, but we are planning to implement it as this: we have an updated mean and variance of the top-1 accuracy on our dataset, and for each team participating in our challenge, the final performance should be within the range of mean +- 3 * variance, and our baseline will be the ResNet50 results in the paper.
I have run following command: "python main.py --data /home/wjm/wjm/dataset/ROBINv1.0/train --val-data /home /wjm/wjm/dataset/ROBINv1.0/nuisances --pretrained" The performance seems to approach 100% after one epoch training, is something wrong happening?Thanks for your kind help!
Hi, the data processing script and the baselines for the three tasks will be released after we have confirmation from our sponsor, thanks for you interest.
Hi, I have updated the dataset and uploaded the evaluation code that we will be using for the codalab server, you can check the code for your questions.
Hi, Is the IID performance of the released dataset the same as the paper? There are two different backbone networks in the paper, are the backbone networks fixed in this challenge? And how to judge the "significantly different in IID performance with our baseline"? Thanks a lot!