Bug in XCiT-L12, XCiT-M12, XCiT-S12 for ImageNet reporducing results

RobustBench / robustbench

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]

https://robustbench.github.io

Other

671 stars 99 forks source link

Bug in XCiT-L12, XCiT-M12, XCiT-S12 for ImageNet reporducing results #105

Closed h-aboutalebi closed 2 years ago

h-aboutalebi commented 2 years ago

Hi @dedeswim, I cannot reproduce the results reported in the table for ImageNet with epsilon 4/255 for XCiT-S and XCiT-M12, XCiT-L12 . The accuracy I get for both clean and robust is much lower around 45% for robust and 50% for clean for all the three models. Can you please help me reproduce the results? Am I missing something here?

dedeswim commented 2 years ago

Hi! I think I found where the issue is: I apply normalization to the inputs when I load the model (e.g., https://github.com/RobustBench/robustbench/blob/master/robustbench/model_zoo/architectures/xcit.py#L95), even though, when I trained the models, I applied no normalization. Let me fix and test this!

dedeswim commented 2 years ago

Meanwhile, if you want to quickly see how to reproduce the results without using load_model from here, you can take a look at how I evaluated the model for the paper here

dedeswim commented 2 years ago

Hi @h-aboutalebi! This should be fixed on master now, with 513d60c. Now you should be able to reproduce the reported numbers with

python -m robustbench.eval --n_ex=5000 --dataset=imagenet --threat_model=Linf --model_name=Debenedetti2022Light_XCiT-S12 --data_dir=/data/imagenet --batch_size=128 --eps=0.0156862745

Of course as --model_name you can also specify Debenedetti2022Light_XCiT-M12 or Debenedetti2022Light_XCiT-L12.

Please, feel free to re-open the issue if this doesn't work!

h-aboutalebi commented 2 years ago

Hi, @dedeswim thanks for your help. I still cannot replicate the results. Now the accuracy has dropped even more. Thanks

dedeswim commented 2 years ago

Hi @h-aboutalebi how are you trying to reproduce the results? Would you mind sharing your code if possible? Moreover, what are the exact numbers you are getting?