Open max-andr opened 4 years ago
Hey. Thank you. Pretrained weight to reproduce the numbers in the paper can be found here: https://drive.google.com/drive/folders/1nWKuvdcjg6PRGE4ewjdZXP-3FusNMauJ?usp=sharing
Thanks a lot for providing the models so quickly!
Hi all,
thanks for sharing the models. I evaluated the model named CIFAR10_resnet18_pretrained_ce_lambda8.0_74per.pth
with AutoAttack+ at eps = 8/255
and got
clean accuracy: 91.03%, robust accuracy 0.00%.
Note that when evaluated only with the cross-entropy loss, the robustness was above 51%.
Could you please check if you get similar results, just to be sure the model is used correctly (the clean accuracy I get matches that reported in your paper)?
Hi everyone,
I was also surprised by the results claimed in this paper. I did a bit of testing myself with the AutoAttack from @fra31 and found the following results:
Dataset | Training | file | Natural Acc | Claimed Robust Acc | AutoAttack Acc |
---|---|---|---|---|---|
SVHN | Magnet Loss | SVHN_resnet18_only_alpha5_41per.pth | 91.95 | 38.59 | 0.90 |
ClusTR | SVHN_resnet18_pretrained_alpha13_52per.pth | 94.28 | 50.77 | 0.29 | |
? | SVHN_resnet18_pretrained_ce_lambda9_76per.pth | 93.81 | ? | 0.87 | |
ClusTR + QTRADES | SVHN_all_in_lambda_9.7_86per.pth | 95.06 | 84.75 | 2.30 | |
CIFAR10 | Magnet Loss | CIFAR10_resnet_magnetonly.pth | 83.14 | 22.54 | 0.01 |
ClusTR | CIFAR10_resnet18_pretrained_alpha12.5_52per.pth | 87.34 | 47.76 | 0.00 | |
ClusTR + QTRADES | CIFAR10_resnet18_pretrained_ce_lambda8.0_74per.pth | 91.03 | 74.04 | 0.03 | |
CIFAR100 | ? | CIFAR100_resnet18_pretrained_alpha8.5_ce_2.0_42per.pth | 69.43 | ? | 0.01 |
I am also wondering if I did something wrong in my evaluation. Could you check?
Hi there,
Thank you for your interest on our work.
Also, thanks for pointing out this finding. We think it's very weird that you're getting these numbers. In our implementation, we provided the code for conducting the standard PGD attack with random restarts (which was in turn taken from https://github.com/anonymous-sushi-armadillo/fast_is_better_than_free_CIFAR10/blob/master/evaluate_cifar.py#L45). Have you been able to reproduce the numbers we report by running the implementation we provide?
Other thoughts on possible reasons behind this (apparently) inconsistent behavior:
MagnetModelWrapper
? Like the one we provide at https://github.com/clustr-official-account/ClusTR-Clustering-Training-For-Robustness/blob/master/utils/attacks.py#L77. This class internally deals with the distance-based classification thing.L
entry in the magnet_data dictionary that is in turn passed to the MagnetModelWrapper
class, here https://github.com/clustr-official-account/ClusTR-Clustering-Training-For-Robustness/blob/master/utils/attacks.py#L114). What value for L
did you try?Update: since this issue was consistently pointed out by @fra31 and @jeromerony, we'll be running experiments with @MotasemAlfarra on our side to try and find the source of this apparent malfunction. We'll try to get back to you as soon as possible with our findings. Again, thank you all for pointing out this issue.
(Although we've externally contacted the participants in this issue, we are posting this in the GitHub issue directly for future reference).
I made a simple script to evaluate. It should be placed in the root of the repository with the files autoattack.py
, autopgd_pt.py
, fab_pt.py
, square.py
and other_utils.py
from the autoattack repository https://github.com/fra31/auto-attack/.
import argparse
from typing import Tuple
import torch
from torch import nn, Tensor
from torch.backends import cudnn
from torch.utils.data import DataLoader
from torchvision import transforms
from torchvision.datasets import SVHN, CIFAR10, CIFAR100
from autoattack import AutoAttack
from datasets.load_dataset import DATASET_MEANS, DATASET_STDS
from models.resnet import ResNet18
from utils.attacks import MagnetModelWrapper
def requires_grad_(model: nn.Module, requires_grad: bool) -> None:
for param in model.parameters():
param.requires_grad_(requires_grad)
class NormalizedModel(nn.Module):
def __init__(self, model: nn.Module, mean: Tuple[float, float, float], std: Tuple[float, float, float]) -> None:
super(NormalizedModel, self).__init__()
self.model = model
self.register_buffer('mean', torch.as_tensor(mean).view(1, 3, 1, 1))
self.register_buffer('std', torch.as_tensor(std).view(1, 3, 1, 1))
def forward(self, input: Tensor) -> Tensor:
normalized_input = (input - self.mean) / self.std
return self.model(normalized_input)
parser = argparse.ArgumentParser()
parser.add_argument('--seed', '-s', default=42, type=int)
parser.add_argument('--flag', '-f', default='deterministic', type=str)
parser.add_argument('--batch-size', '-b', default=512, type=int)
parser.add_argument('--dataset', '-d', default=None, type=str)
parser.add_argument('--log-file', '-l', default=None, type=str)
parser.add_argument('--checkpoint', '-c', default=None, type=str)
parser.add_argument('--num_samples', '-n', default=None, type=int)
parser.add_argument('--epsilon', '-e', default=8 / 255, type=float)
args = parser.parse_args()
setattr(cudnn, args.flag, True)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
num_classes = 10
if args.dataset == 'cifar10':
dataset = CIFAR10('data', train=False, transform=transforms.ToTensor(), download=True)
elif args.dataset == 'cifar100':
dataset = CIFAR100('data', train=False, transform=transforms.ToTensor(), download=True)
num_classes = 100
elif args.dataset == 'svhn':
dataset = SVHN('data', split='test', transform=transforms.ToTensor(), download=True)
mean, std = DATASET_MEANS[args.dataset], DATASET_STDS[args.dataset]
mean_t, std_t = torch.tensor(mean, device=device).view(1, 3, 1, 1), torch.tensor(std, device=device).view(1, 3, 1, 1)
m = ResNet18(num_classes=num_classes)
if args.checkpoint is not None:
checkpoint = torch.load(args.checkpoint, map_location=device)
m.load_state_dict(checkpoint['state_dict'], strict=False)
magnet_data = {
'cluster_classes': checkpoint['cluster_classes'],
'cluster_centers': checkpoint['cluster_centers'],
'variance': checkpoint['variance'],
'L': checkpoint['L'],
'K': checkpoint['K'],
'normalize_probs': checkpoint['normalize_probs']
}
magnet_model = MagnetModelWrapper(model=m, magnet_data=magnet_data, mean=mean_t, std=std_t)
model = NormalizedModel(model=magnet_model, mean=mean, std=std)
model.eval()
model.to(device)
requires_grad_(model, False)
torch.manual_seed(seed=args.seed)
n = len(dataset) if args.num_samples is None else args.num_samples
loader = DataLoader(dataset=dataset, batch_size=n)
batch, labels = next(iter(loader))
attack = AutoAttack(model=model, eps=args.epsilon, log_path=args.log_file,
attacks_to_run=['apgd-dlr', 'apgd-ce', 'square', 'fab']) # apgd-dlr being more successful, this reduces overall run-time
attack.run_standard_evaluation(batch, labels, bs=args.batch_size)
This reproduces the evaluation results mentioned above. Tell me if you find something done incorrectly in this script.
Was anyone able to reproduce the CIFAR-10 results using Auto Attack?
Hi all,
Thanks for an interesting paper. Is it possible to open source the ClusTR + QTRADES model that achieves 74.04% PGD-100 accuracy for eps=8/255 on CIFAR-10 (from Table 1 of your paper)? This seem like a very strong result so it would be interesting to check it in more detail.
Best, Maksym