Closed XixuHu closed 1 year ago
Hi,
These details are mentioned in our paper https://arxiv.org/abs/2010.09670. To briefly summarize:
how many attack steps do you take to get the robust acc in the leaderboard?
We use AutoAttack with the default settings.
And for the CIFAR-10-C, the accuracy under what severity is reported in the leaderboard?
We average the accuracy over all severity levels.
I would also like to know the num_of_examples used in the above evaluation so that I can better match my results with the existing ones.
For CIFAR-10-C, we use all 10k test images.
Finally, you can also just run the function benchmark()
to benchmark your models. An example how to do it is available here in the README:
import torch
from robustbench import benchmark
from myrobust model import MyRobustModel
threat_model = "Linf" # one of {"Linf", "L2", "corruptions"}
dataset = "cifar10" # one of {"cifar10", "cifar100", "imagenet"}
model = MyRobustModel()
model_name = "<Name><Year><FirstWordOfTheTitle>"
device = torch.device("cuda:0")
clean_acc, robust_acc = benchmark(model, model_name=model_name, n_examples=10000, dataset=dataset,
threat_model=threat_model, eps=8/255, device=device,
to_disk=True)
I hope that helps.
Hi, Max. Thank you so much for your great reply! 😀
Hi! I would like to know the exact details of the leaderboard evaluation. For example, for CIFAR-10, Linf, eps=8/255, how many attack steps do you take to get the robust acc in the leaderboard? And for the CIFAR-10-C, the accuracy under what severity is reported in the leaderboard? I would also like to know the num_of_examples used in the above evaluation so that I can better match my results with the existing ones. Thank you so much!