RobustBench / robustbench

RobustBench: a standardized adversarial robustness benchmark [NeurIPS 2021 Benchmarks and Datasets Track]
https://robustbench.github.io
Other
671 stars 99 forks source link

[New Model] <Tian2022Deeper> #99

Closed ruitian12 closed 1 year ago

ruitian12 commented 2 years ago

Paper Information

Leaderboard Claim(s)

Model 1

Model 2

Model Zoo:

dedeswim commented 2 years ago

Hi, thanks for the submission! We will add it as soon as possible

dedeswim commented 2 years ago

Hi @ruitian12! A small update: I have seen that you may be using a model from timm. Can you please confirm that the architectures you are using are indeed deit_{small,base}_patch16_224 in timm? If this is the case, then I will add the model as soon as #100 (which adds better support for timm models) is merged

ruitian12 commented 2 years ago

Hi @dedeswim. Thanks for reaching out. The models we use are identical with architectures in timm and our checkpoints can be loaded directly onto deit_{small,base}_patch16_224. So it would be feasible and simple to add our models if timm supports were merged.

dedeswim commented 2 years ago

That's great, thanks for letting me know!

dedeswim commented 2 years ago

Hi @ruitian12 do you have unaggregated results as we do for the other models here and here?

ruitian12 commented 2 years ago

Hi, sorry for late reply, I uploaded according results for ImageNet-C and ImageNet-3DCC.

dedeswim commented 2 years ago

Hi! Sorry for my late reply, but I've been very busy recently. Thanks for the update, I will finalize the addition of your model ASAP!

dedeswim commented 2 years ago

Hi @ruitian12, I am sorry it's taking so long, but I realized there are some changes to do before integrating new corruptions models. Once #111 will be merged we will be able to merge the branch with your model.

Thanks for your patience!

dedeswim commented 2 years ago

Nvm, I manually computed the mCE values so that I can merge your PR now. I am adding it with PR #112, if everything looks good to you I merge it

ruitian12 commented 2 years ago

Hi @dedeswim, sorry for late reply. Besides, thanks for your great efforts. If necessary, we further provide results on full datasets of IN-3DCC and IN-C (which also includes mCE) since the previous result are based on the subset of 5000 images. Great thanks!

dedeswim commented 2 years ago

Hi @ruitian12! Thanks for the update. I noticed that there is a significant difference between these and the results you originally posted, which is something we didn't observe with other models on the leaderboard. Did anything change between the two evaluations?

ruitian12 commented 2 years ago

Hi @dedeswim, Thanks for your concerns! Since results on full datasets cannot be implemented with single gpu, we do change the evaluation code. We base full-results evaluation on IN-3DCC on our own multi-gpu implementation. For IN-C, the 5000-sample results are based on open-source code. To Note that pre-processing for IN-C evaluation is orignally based on 'Crop224', which is also adopted in our full-dataset evaluation and is different from 'Res256Crop224' in robustbench.

dedeswim commented 2 years ago

Thanks for your reply @ruitian12! Were the first results computed by using Res256Crop224, and these later ones on the full dataset with Crop224? if this is correct, would you mind reporting the results on 5000 images by using Crop224 please? We would prefer to add these to the leaderboard, to keep it consistent with the other entries (computed on 5000 samples), but we also want to report the best result possible for your entry (which seems to be given by Crop224, instead of Res256Crop224). Thanks!

dedeswim commented 2 years ago

FYI, you can change the preprocessing method to use by passing preprocessing='Crop224' to the benchmark function

ruitian12 commented 1 year ago

Hi @dedeswim, Thanks for your careful checks and kind suggestion! After going over the evaluation code, I figured out that the misalignment is caused by failing to conduct normalization when testing for 5000 samples with robustbench. I will upate our results soon in future 2 days by rectifying normalization and preprocessing. Great Thanks!

ruitian12 commented 1 year ago

Hi @dedeswim, Sorry for the late update. I have uploaded the corrected results on IN-C and IN-3DCC for both 5000 samples and full datasets. Particularly, I noticed that there exists performance gaps between different Pillow versions and he uploaded results are evaluated under Pillow==8.2.0. Thanks for your careful suggestions after all.

ruitian12 commented 1 year ago

Hi, @dedeswim. Sorry to bother you about the merge. I have already updated the results as mentioned above. Feel free to inform me if there exists any additional concerns. Thanks!

max-andr commented 1 year ago

Hi @ruitian12,

I've updated the leaderboard https://robustbench.github.io/#div_imagenet_corruptions_heading with your models. The numbers are close to your latest ones but not exactly the same (~0.4% difference). This doesn't change the ranking though (your models are clearly top-1 and top-2 by a great margin).

I've opened a PR (based on Edoardo's earlier PR) that allows a better support of 2DCC and 3DCC evaluation. I used the following scripts to evaluate your models:

And I used Pillow 9.4.0 so maybe that contributed to a difference. Let us know what you think about these evaluation results.

Best, Maksym

ruitian12 commented 1 year ago

Hi @max-andr, The evaluation results are sound since we also saw a slight drop in performance with an upgraded Pillow version in previous experiments. Thanks for your great efforts!

Best, Rui

max-andr commented 1 year ago

perfect! then i'll close this issue