Specific implementation details for the leaderboard

XixuHu commented 1 year ago

Hi! I would like to know the exact details of the leaderboard evaluation. For example, for CIFAR-10, Linf, eps=8/255, how many attack steps do you take to get the robust acc in the leaderboard? And for the CIFAR-10-C, the accuracy under what severity is reported in the leaderboard? I would also like to know the num_of_examples used in the above evaluation so that I can better match my results with the existing ones. Thank you so much!

max-andr commented 1 year ago

Hi,

These details are mentioned in our paper https://arxiv.org/abs/2010.09670. To briefly summarize:

how many attack steps do you take to get the robust acc in the leaderboard?

We use AutoAttack with the default settings.

And for the CIFAR-10-C, the accuracy under what severity is reported in the leaderboard?

We average the accuracy over all severity levels.

I would also like to know the num_of_examples used in the above evaluation so that I can better match my results with the existing ones.

For CIFAR-10-C, we use all 10k test images.

Finally, you can also just run the function benchmark() to benchmark your models. An example how to do it is available here in the README:

import torch

from robustbench import benchmark
from myrobust model import MyRobustModel

threat_model = "Linf"  # one of {"Linf", "L2", "corruptions"}
dataset = "cifar10"  # one of {"cifar10", "cifar100", "imagenet"}

model = MyRobustModel()
model_name = "<Name><Year><FirstWordOfTheTitle>"
device = torch.device("cuda:0")

clean_acc, robust_acc = benchmark(model, model_name=model_name, n_examples=10000, dataset=dataset,
                                  threat_model=threat_model, eps=8/255, device=device,
                                  to_disk=True)

I hope that helps.

XixuHu commented 1 year ago

Hi, Max. Thank you so much for your great reply! 😀

RobustBench / robustbench

Specific implementation details for the leaderboard #116