please confirm hyperparameters used to train the pretrained models linked on the README

Embeddave commented 4 years ago

Hi and thank you for putting this package together.

It's very helpful to have the pretrained models linked on the repo README.

Can you please just confirm: those models were all trained with the TRAINING_DEFAULTS from robustness.defaults?

E.g. for ResNet50 trained on CIFAR10, the hyperparameters would have been:

    datasets.CIFAR: {
        "epochs": 150,
        "batch_size": 128,
        "weight_decay":5e-4,
        "step_lr": 50
    }

(from https://github.com/MadryLab/robustness/blob/219dff192f1429f1580de1606df3971c32b0dbdb/robustness/defaults.py#L14)

I did test with those hyperparameters and get similar accuracy (~95%) In case it helps anyone, I realized that ResNet models require running a few training batches through before putting the model in eval mode. Here's the script I used to check:

from pathlib import Path

import dill
import numpy as np
import robustness.defaults
from robustness.datasets import CIFAR
from robustness.cifar_models.resnet import ResNet50
from robustness.model_utils import make_and_restore_model
import torch
from torch.optim import SGD
import torch.nn as nn

def load(model, resume_path):
    checkpoint = torch.load(resume_path, pickle_module=dill)
    sd = checkpoint['model']
    sd = {k[len('module.model.'):]:v for k,v in sd.items() if k.startswith('module.model.')}
    model.load_state_dict(sd)

def get_default_device():
    if torch.cuda.is_available():
        return 'cuda'
    else:
        return 'cpu'

CIFAR10_TRAIN_DEFAULTS = robustness.defaults.TRAINING_DEFAULTS[CIFAR]

for hyperparam_name in ['lr', 'momentum']:
    val = [el[-1] for el in robustness.defaults.TRAINING_ARGS if el[0] == hyperparam_name]
    assert len(val) == 1, f'did not find single value for {hyperparam_name}, value was: {val}'
    val = val[0]
    CIFAR10_TRAIN_DEFAULTS[hyperparam_name] = val

CIFAR10_MEAN = [0.4914, 0.4822, 0.4465]
CIFAR10_STD = [0.2023, 0.1994, 0.2010]

DEVICE = get_default_device()
print(f'device: {DEVICE}')
# test pre-trained model on CIFAR10 dataset

cifar_dataset = CIFAR('/data/cifar-10-batches-py/')

resume_path = Path('data/cifar_nat.pt')

resnet = ResNet50()

load(resnet, resume_path)
resnet.to(DEVICE);

train_loader, test_loader = cifar_dataset.make_loaders(workers=4, batch_size=CIFAR10_TRAIN_DEFAULTS['batch_size'])

# run a few batches through with model in `train` mode to so `batchnorm` stats have a good value.
# see: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/3
# without doing this, test set accuracy will be ~69%
N_BATCH = 100

for batch_num, batch in enumerate(train_loader):
    if batch_num > N_BATCH:
        break
    with torch.no_grad():
        x, y = batch[0].to(DEVICE), batch[1].to(DEVICE)
        out = resnet(x)

resnet.eval();

accs = []

for batch in test_loader:
    with torch.no_grad():
        x, y = batch[0].to(DEVICE), batch[1].to(DEVICE)
        out = resnet(x)
        y_pred = out.argmax(dim=1)
        accs.append(
            (y_pred == y).sum().cpu().numpy() / y.shape[0]
        )

# `robustness` repo readme claims 95.25% accuracy for this model
np.array(accs).mean()

Hadisalman commented 4 years ago

Hi @DavidN-EmbedIntel,

Yes the default hyper-parameters are the ones used to train this model.
You should be able to get 95.25% accuracy by evaluating the pratrained model without running any training batches. I looked into your code and noticed few issues (I fix them in the code below):
- you need to normalize the input (if you use our AttackerModel, this is already included).
- your calculation of the accuracy is slightly off. You cannot take the average of the per-batch accuracy since the last batch is potentially not equal to the rest of the batches depending on the batch size you use.

Here is a modification of your code that should give exactly 95.25% validation accuracy.

from pathlib import Path

import dill
import numpy as np
import robustness.defaults
from robustness.datasets import CIFAR
from robustness.cifar_models.resnet import ResNet50
from robustness.model_utils import make_and_restore_model
from robustness.tools.helpers import InputNormalize
import torch
from torch.optim import SGD
import torch.nn as nn

def load(model, resume_path):
    checkpoint = torch.load(resume_path, pickle_module=dill)
    sd = checkpoint['model']
    sd = {k[len('module.model.'):]:v for k,v in sd.items() if k.startswith('module.model.')}
    model.load_state_dict(sd)

def get_default_device():
    if torch.cuda.is_available():
        return 'cuda'
    else:
        return 'cpu'

DEVICE = get_default_device()
print(f'device: {DEVICE}')
# test pre-trained model on CIFAR10 dataset

cifar_dataset = CIFAR('/data/cifar-10-batches-py/')

resume_path = Path('data/cifar_nat.pt')

resnet = ResNet50()

load(resnet, resume_path)
resnet.to(DEVICE);

train_loader, test_loader = cifar_dataset.make_loaders(workers=4, batch_size=64)

resnet.eval()

CIFAR10_MEAN = [0.4914, 0.4822, 0.4465]
CIFAR10_STD = [0.2023, 0.1994, 0.2010]

normalizer = InputNormalize(torch.Tensor(CIFAR10_MEAN).cuda(),torch.Tensor(CIFAR10_STD).cuda())
correct = 0
with torch.no_grad():
    for batch in test_loader:
        x, y = batch[0].to(DEVICE), batch[1].to(DEVICE)
        out = resnet(normalizer(x))
        y_pred = out.argmax(dim=1)
        correct += (y_pred==y).detach().cpu().sum()

# `robustness` repo readme claims 95.25% accuracy for this model
print(100.*correct/len(test_loader.dataset))

Hope this helps! Let me know if you have any further questions.

andrewilyas commented 4 years ago

Hi @DavidN-EmbedIntel, closing this issue for now since it seems resolved, feel free to open an issue or comment here if you are still having any troubles.

Embeddave commented 4 years ago

Yes please feel free to close.

Thank you @Hadisalman and @andrewilyas your reply and edits to the script were very helpful, and thank you again for sharing the code + models.

MadryLab / robustness

please confirm hyperparameters used to train the pretrained models linked on the README #75