ck-amrahd / birds

5 stars 0 forks source link

Questions about optimal values of the lambdas, data split and epsilon values #1

Closed mohanhanmo closed 3 years ago

mohanhanmo commented 3 years ago

Hi Dharma,

Your work and code look amazing to me, so I was trying to repeat your experiment. I basically can run the model training through, but about the detailed parameter optimization and validation, I got following questions:

  1. In split.py, only a quarter of all data were saved for training, validation and testing, by
    • for img_name in train_images[::4]:
    • for img_name in train_val_images[::4]:
    • for img_name in val_images[::4]:
    • for img_name in test_images[::4]:

which gave me around 1590 images for training and around 530 for each testing and validation. However, when I trained and validated the model with a quarter of all data like this, I could not get the same accuracy as you listed in the paper. So I changed the code into following:

  • for img_name in train_images:
  • for img_name in train_val_images:
  • for img_name in val_images:
  • for img_name in test_images:

which gave me around 5914 training data, and around 1971 for each testing and validation. Then I got the similar validation accuracy as you showed in the paper. Did I do the correct thing? Please let me know if it was wrong, thank you very much!

  1. What were your optimal choices of lambda_1 and lambda_2 with different epsilons, please? I tried lambda_1=lambda_2=1 when epsilon=0 using 5914 training images, from which I can get similar accuracy as it is in your paper (around 75%). But it would be awesome if you could let me know more details about the optimal choices of lambdas with different epsilon values.

  2. I also validated the pretrained "normal" model (train_method = 'normal') with different epsilons (adversarial perturbation radii). However, I could not get any similar accuracy as in your paper. For example, if I validate the pretrained "normal" model with epsilon=0.175 , the validation accuracy I got was only around 30%, while in the paper the validation accuracy of epsilon=0.175 should be around 52% for the "normal" model. Same thing happened to the "bbox" model as well, where I got a 35% validation accuracy using epsilon=0.175 and lambda_1=lambda_2=1, but in the paper the validation accuracy of "lambda equal" model with epsilon=0.175 should be around 65%. However, when I validated the model with epsilon=0.0025, I can get a similar results corresponding to that of epsilon=0.175 in your paper. The following is the code I used for the robust accuracy validation, could you please kindly let me know if there is anything wrong?

    
    import torch
    import torch.nn as nn
    from torchvision import transforms
    import foolbox as fb
    import numpy as np
    from torchvision import datasets, models
    import pickle
    import matplotlib.pyplot as plt

model_path = "/results/resnet50/normal_1_1.pth" val_dataset_path = '/data/val' epsilon = 0.175

num_classes = 200 device = torch.device('cuda')

transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])

val_dataset = datasets.ImageFolder(val_dataset_path, transform) val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=0)

bounds = (0, 1) print(f'Running Attacks...') model = models.resnet50(pretrained=False) input_features = model.fc.in_features model.fc = nn.Linear(input_features, num_classes) model.load_state_dict(torch.load(model_path))

model.eval() fmodel = fb.PyTorchModel(model, bounds=bounds) attack = fb.attacks.FGSM()

robust_acc_list = [] for inputs, labels in valloader: inputs, labels = inputs.to(device), labels.to(device) , _, is_adv = attack(fmodel, inputs, labels, epsilons=epsilon) robust_acc = 1 - is_adv.float().mean(axis=-1) robust_acc_list.append(robust_acc.cpu().numpy())

avg_acc = np.mean(robust_acc_list) print(avg_acc)



Thank you in advance for your kind help and time!!!
ck-amrahd commented 3 years ago

Hi Mo, Thank you for your interest in our paper.

  1. Yes, you are correct. it should be train_images, I was experimenting with the smaller dataset and forgot to update to previous values.
  2. For lambda_1 and lambda_2, we selected them by training multiple models as mentioned in the paper and selecting the model that performs best for a given value of epsilon.
  3. validation set was used to select models among different trained models and the accuracies mentioned in the paper are for the test dataset. I hope it clears your confusion. Thank you.
mohanhanmo commented 3 years ago

Hi Dharma,

Thank you very much for your quick response! Your answers are really helpful and I appreciate it a lot. Continuing with my questions:

  1. Thank you very much for your clarification!

  2. I understand this part that the lambda_1 and lambda_2 were selected by grid search according to the performance of each parameter pair, and we may have different optimal results of lambda_1 and lambda_2 since the dataset was split randomly in each of our side. But I was wondering what is the best selection of lambda_1 and lambda_2 in your side when epsilon=0, so that I could compare them with the optimal selection on my end to see how different the optimal selections could be for different data split.

  3. For the validation accuracies with different epsilons, I also tested the model on the test dataset, and the results were similar with those of the validation dataset (around 30% accuracy for the normal model case with epsilon=0.175), since the dataset was split randomly into testing and validation set without any specification. Is there any other possible reason that I could not get similar accuracies with different epsilons as the curve in the paper, please?

Thank you very much for your kind help!!

ck-amrahd commented 3 years ago

Hi Mo, I looked at the results and it seems like lamnda_1=1 and lambda_2=4.64 gave good values for epsilon equals 0 [for bbox training]. For a normal model, we should use lamnda_1=0 and lambda_2=0 [that's normal CNN training where we don't do any penalization]. I used the train_val set to select the best model during training and then use the val dataset to select lambda_1 and lambda_2. If you use lambda_1=0 and lambda_2=0 and do normal training, it should produce test acc around 50% on the test set with epsilon=0.175, which is the value that you can see in the graph in the paper. Thank you.

mohanhanmo commented 3 years ago

Hi Mo, I looked at the results and it seems like lamnda_1=1 and lambda_2=4.64 gave good values for epsilon equals 0 [for bbox training]. For a normal model, we should use lamnda_1=0 and lambda_2=0 [that's normal CNN training where we don't do any penalization]. I used the train_val set to select the best model during training and then use the val dataset to select lambda_1 and lambda_2. If you use lambda_1=0 and lambda_2=0 and do normal training, it should produce test acc around 50% on the test set with epsilon=0.175, which is the value that you can see in the graph in the paper. Thank you.

Thank you so much for your suggestion! I will try them out. For now I will close the issue.