Closed mohanhanmo closed 3 years ago
Hi Mo, Thank you for your interest in our paper.
Hi Dharma,
Thank you very much for your quick response! Your answers are really helpful and I appreciate it a lot. Continuing with my questions:
Thank you very much for your clarification!
I understand this part that the lambda_1 and lambda_2 were selected by grid search according to the performance of each parameter pair, and we may have different optimal results of lambda_1 and lambda_2 since the dataset was split randomly in each of our side. But I was wondering what is the best selection of lambda_1 and lambda_2 in your side when epsilon=0, so that I could compare them with the optimal selection on my end to see how different the optimal selections could be for different data split.
For the validation accuracies with different epsilons, I also tested the model on the test dataset, and the results were similar with those of the validation dataset (around 30% accuracy for the normal model case with epsilon=0.175), since the dataset was split randomly into testing and validation set without any specification. Is there any other possible reason that I could not get similar accuracies with different epsilons as the curve in the paper, please?
Thank you very much for your kind help!!
Hi Mo, I looked at the results and it seems like lamnda_1=1 and lambda_2=4.64 gave good values for epsilon equals 0 [for bbox training]. For a normal model, we should use lamnda_1=0 and lambda_2=0 [that's normal CNN training where we don't do any penalization]. I used the train_val set to select the best model during training and then use the val dataset to select lambda_1 and lambda_2. If you use lambda_1=0 and lambda_2=0 and do normal training, it should produce test acc around 50% on the test set with epsilon=0.175, which is the value that you can see in the graph in the paper. Thank you.
Hi Mo, I looked at the results and it seems like lamnda_1=1 and lambda_2=4.64 gave good values for epsilon equals 0 [for bbox training]. For a normal model, we should use lamnda_1=0 and lambda_2=0 [that's normal CNN training where we don't do any penalization]. I used the train_val set to select the best model during training and then use the val dataset to select lambda_1 and lambda_2. If you use lambda_1=0 and lambda_2=0 and do normal training, it should produce test acc around 50% on the test set with epsilon=0.175, which is the value that you can see in the graph in the paper. Thank you.
Thank you so much for your suggestion! I will try them out. For now I will close the issue.
Hi Dharma,
Your work and code look amazing to me, so I was trying to repeat your experiment. I basically can run the model training through, but about the detailed parameter optimization and validation, I got following questions:
split.py
, only a quarter of all data were saved for training, validation and testing, bywhich gave me around 1590 images for training and around 530 for each testing and validation. However, when I trained and validated the model with a quarter of all data like this, I could not get the same accuracy as you listed in the paper. So I changed the code into following:
which gave me around 5914 training data, and around 1971 for each testing and validation. Then I got the similar validation accuracy as you showed in the paper. Did I do the correct thing? Please let me know if it was wrong, thank you very much!
What were your optimal choices of lambda_1 and lambda_2 with different epsilons, please? I tried lambda_1=lambda_2=1 when epsilon=0 using 5914 training images, from which I can get similar accuracy as it is in your paper (around 75%). But it would be awesome if you could let me know more details about the optimal choices of lambdas with different epsilon values.
I also validated the pretrained "normal" model (train_method = 'normal') with different epsilons (adversarial perturbation radii). However, I could not get any similar accuracy as in your paper. For example, if I validate the pretrained "normal" model with epsilon=0.175 , the validation accuracy I got was only around 30%, while in the paper the validation accuracy of epsilon=0.175 should be around 52% for the "normal" model. Same thing happened to the "bbox" model as well, where I got a 35% validation accuracy using epsilon=0.175 and lambda_1=lambda_2=1, but in the paper the validation accuracy of "lambda equal" model with epsilon=0.175 should be around 65%. However, when I validated the model with epsilon=0.0025, I can get a similar results corresponding to that of epsilon=0.175 in your paper. The following is the code I used for the robust accuracy validation, could you please kindly let me know if there is anything wrong?
model_path = "/results/resnet50/normal_1_1.pth" val_dataset_path = '/data/val' epsilon = 0.175
num_classes = 200 device = torch.device('cuda')
transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) ])
val_dataset = datasets.ImageFolder(val_dataset_path, transform) val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=0)
bounds = (0, 1) print(f'Running Attacks...') model = models.resnet50(pretrained=False) input_features = model.fc.in_features model.fc = nn.Linear(input_features, num_classes) model.load_state_dict(torch.load(model_path))
model.eval() fmodel = fb.PyTorchModel(model, bounds=bounds) attack = fb.attacks.FGSM()
robust_acc_list = [] for inputs, labels in valloader: inputs, labels = inputs.to(device), labels.to(device) , _, is_adv = attack(fmodel, inputs, labels, epsilons=epsilon) robust_acc = 1 - is_adv.float().mean(axis=-1) robust_acc_list.append(robust_acc.cpu().numpy())
avg_acc = np.mean(robust_acc_list) print(avg_acc)