MadryLab / robustness

A library for experimenting with, training and evaluating neural networks, with a focus on adversarial robustness.
MIT License
903 stars 181 forks source link

How to finetune a model and robustly train it #96

Closed elcronos closed 3 years ago

elcronos commented 3 years ago

I see in the examples that there are some specific ways to load datasets such as ImageNet and CIFAR. However, I have a custom dataset with 10 labels with a directory structure with train/test folders. How can I finetune a pretrained model such as ResNet50, change the head so it has 10 outputs in the last layer and adversarially train it using this library?

Hadisalman commented 3 years ago

Hi @elcronos!

Here are the steps to do so.

  1. Add your custom dataset to the lib as described here.

  2. Create a model and modify the last layer of that model, e.g.

    
    from robustness.datasets import MyNewDataSet
    from robustness.model_utils import make_and_restore_model
    from torch import nn 
    ds = MyNewDataSet('/path/to/dataset/')
    attacker_model, _ = make_and_restore_model(arch='resnet50', pytorch_pretrained=True, dataset=ds)

num_ftrs = attacker_model.model.fc.in_features num_classes = 10 # or whatever your custom dataset has

Replace the last layer of your model with a layer that fits your custom dataset

attacker_model.model.fc = nn.Linear(num_ftrs, num_classes)

Next: Continue to do adversarial training as you would normally do


3. Run adversarial training as you would do normally for cifar10 or ImageNet (e.g. [here](https://robustness.readthedocs.io/en/latest/example_usage/cli_usage.html#training-a-robust-resnet-50-for-the-restricted-imagenet-dataset)).

For more examples how to finetune using our lib, checkout our [code-base on transfer learning](https://github.com/microsoft/robust-models-transfer), which .

Hope this helps. Please let us know if you have any further questions!
elcronos commented 3 years ago

Hi @Hadisalman,

Thanks for your quick response. So I tried to follow those steps but I'm still getting some errors.

This is what I added in datasets.py:

class MyDataset(DataSet):
    def __init__(self, data_path,**kwargs):
        self.num_classes = 1000

        ds_kwargs = {
            'num_classes': self.num_classes,
            'mean': torch.tensor([0.4859, 0.4131, 0.3083]),
            'std': torch.tensor([0.2919, 0.2507, 0.2273])
            'transform_train': da.TRAIN_TRANSFORMS_IMAGENET,
            'transform_test': da.TEST_TRANSFORMS_IMAGENET
        }
        super(MyDataset, self).__init__('mydataset', data_path, **ds_kwargs)

    def get_model(self, arch, pretrained):

        return imagenet_models.__dict__[arch](num_classes=self.num_classes,
                                        pretrained=pretrained)

Then, I tried the following code for the adversarial training:

import torch
from torch import nn
from robustness.datasets import MyDataset
from robustness.model_utils import make_and_restore_model
from cox.utils import Parameters
from cox import store
from robustness import model_utils, datasets, train, defaults

ds = MyDataset('/path/to/my_model/MyDataset', batch_size=8)
m, _ = make_and_restore_model(arch='resnet50', pytorch_pretrained=True, 
                                           dataset=ds)
train_loader, val_loader = ds.make_loaders(batch_size=64, workers=8)
# Create a cox store for logging
OUT_DIR = './outputs'
out_store = store.Store(OUT_DIR)

num_ftrs = attacker_model.model.fc.in_features
num_classes = 10 # or whatever your custom dataset has

# Replace the last layer of your model with a layer that fits your custom dataset
attacker_model.model.fc = nn.Linear(num_ftrs, num_classes)

train_kwargs = {
    'out_dir': "train_out",
    'adv_train': 1,
    'constraint': '2',
    'eps': 0.5,
    'attack_lr': 1.5,
    'attack_steps': 20
}

train_args = Parameters(train_kwargs)

# Fill whatever parameters are missing from the defaults
train_args = defaults.check_and_fill_args(train_args,
                        defaults.TRAINING_ARGS, MyDataset)
train_args = defaults.check_and_fill_args(train_args,
                        defaults.PGD_ARGS, MyDataset)

# Train a model
train.train_model(train_args, m, (train_loader, val_loader), store=out_store)

I'm getting the following error:

<ipython-input-19-28c6fcfda123> in <module>
     27 
     28 # Fill whatever parameters are missing from the defaults
---> 29 train_args = defaults.check_and_fill_args(train_args,
     30                         defaults.TRAINING_ARGS, MyDataset)
     31 train_args = defaults.check_and_fill_args(train_args,

~/anaconda3/envs/pytorch-flash/lib/python3.9/site-packages/robustness/defaults.py in check_and_fill_args(args, arg_list, ds_class)
    184         if arg_default == REQ: raise ValueError(f"{arg_name} required")
    185         elif arg_default == BY_DATASET:
--> 186             setattr(args, name, TRAINING_DEFAULTS[ds_class][name])
    187         elif arg_default is not None:
    188             setattr(args, name, arg_default)

KeyError: <class 'robustness.datasets.MyDataset'>

Also, in the code above. Could you please indicate how can I customize my PGD attack with these parameters:

ATTACK_EPS = 0.05 ATTACK_STEPSIZE = 0.01 ATTACK_STEPS = 100 TARGETED = True CUSTOM_LOSS = None

I've been copying some examples and modifying to code to adjusted to my dataset but I'm getting a bit confudes with the API.

Hadisalman commented 3 years ago

@elcronos you need to add the training defaults of your dataset inside defaults.py similar to below

TRAINING_DEFAULTS = {
    datasets.MyDataset: {
        "epochs": 150,
        "batch_size": 128,
        "weight_decay":5e-4,
        "step_lr": 50
    },
.
.
.
}

or use existing ones, e.g.

train_args = defaults.check_and_fill_args(train_args,
                        defaults.TRAINING_ARGS, ImageNet)
train_args = defaults.check_and_fill_args(train_args,
                        defaults.PGD_ARGS, ImageNet)

Regarding adversarial training, you can specify them all in

train_kwargs = {
    'out_dir': "train_out",
    'adv_train': 1,
    'constraint': '2',
    'eps': 0.05,
    'attack_lr': 0.01,
    'attack_steps': 100
}

but our train.train_model doesn't allow you to do targeted attack for adversarial training. If you want to do targeted PGD attack, you can write your own training loop similar to train.py, and every time you do a forward pass, call


out, xadv = attacker_model(x, y_target, make_adv=True)

which returns the targeted adversarial example xadv for the target y_target.

Hope his helps.

elcronos commented 3 years ago

Thanks again @Hadisalman,

I was able to finetune a custom model. I saw that once it finished training in the path: /outputs/8a975b65-bcfc-477b-b679-e7193f81a756 there is a 69_checkpoint.pt file. There is also checkpoint.pt.best.

So my question is now how I can load those checkpoints for inference later. I tried the code from an example:

ds = MyDataset('/path/to/dataset')
model, _ = make_and_restore_model(arch='resnet50', dataset=ds,
             resume_path='./outputs/8a975b65-bcfc-477b-b679-e7193f81a756/69_checkpoint.pt')

I'm getting the Error:

RuntimeError: Error(s) in loading state_dict for AttackerModel:
    size mismatch for model.fc.weight: copying a param with shape torch.Size([10, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 2048]).
    size mismatch for model.fc.bias: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([1000]).
    size mismatch for attacker.model.fc.weight: copying a param with shape torch.Size([10, 2048]) from checkpoint, the shape in current model is torch.Size([1000, 2048]).
    size mismatch for attacker.model.fc.bias: copying a param with shape torch.Size([10]) from checkpoint, the shape in current model is torch.Size([1000]).

The problem seems to be the mismatch between the state_dict of my custom model which has 10 outputs in the last layer and the original resnet50 which has 1000. How can I modify the code so it loads my custom weights? I also noticed that the files in outputs contains much more information than I need. Is there any way to save and load only the model without the other training parameters in a .pt file?

andrewilyas commented 3 years ago

Hi @elcronos!

  1. You need to change the number of classes in your custom dataset, see @Hadisalman 's example above.
  2. No, there is no way to do that.
elcronos commented 3 years ago

Hi @andrewilyas,

Thanks for your prompt response. Maybe I was not clear enough.

I understand the part of changing the code:

ds = MyDataset('/path/to/dataset', batch_size=8)
linf_pgd_resnet, _ = make_and_restore_model(arch='resnet50', pytorch_pretrained=False,dataset=ds)
num_ftrs = linf_pgd_resnet.model.fc.in_features
num_classes = 10
linf_pgd_resnet.model.fc = nn.Linear(num_ftrs, num_classes)

But then my question is: how can I correctly load the weights of the model.? In vanilla Pytorch I would usually do something like this:

checkpoint = torch.load('./outputs/8a975b65-bcfc-477b-b679-e7193f81a756/69_checkpoint.pt')
linf_pgd_resnet.load_state_dict(checkpoint['model'])

But in this case, it seems that the format that the model use is incompatible and I get this error:

RuntimeError: Error(s) in loading state_dict for AttackerModel:
    Missing key(s) in state_dict: "normalizer.new_mean", "normalizer.new_std", "model.conv1.weight", "model.bn1.weight", "model.bn1.bias", "model.bn1.running_mean", "model.bn1.running_var", "model.layer1.0.conv1.weight", "model.layer1.0.bn1.weight", "model.layer1.0.bn1.bias"

So clearly the model saved with this library does it in a different way than the torchvision models and that makes the keys incompatible. Is there any way I could solve this problem? How can I properly load the the state_dict of the model saved with the robustness library?

I hope it's more clear now my question.

andrewilyas commented 3 years ago

Hi, it seems like your definition of MyDataset looks like:

class MyDataset(DataSet):
    def __init__(self, data_path,**kwargs):
        self.num_classes = 1000

and so the library expects the checkpoint to have 1000 class output. Try changing this to 10. Also, please see the repository that @Hadisalman linked (the robust-models-transfer repository), which covers how to do this in more depth.