Harry24k / adversarial-attacks-pytorch

PyTorch implementation of adversarial attacks [torchattacks]
https://adversarial-attacks-pytorch.readthedocs.io/en/latest/index.html
MIT License
1.86k stars 348 forks source link

torchattacks.FGSM(model, eps=0) perturbs image #172

Open michellerosehb opened 9 months ago

michellerosehb commented 9 months ago

❔ Any questions

torchattacks.FGSM(model, eps=0) seems to perturb my data, even though eps = 0. When testing my original data, I get an accuracy of 81.2%. Once I get predictions when performing an 'attack' without eps, my accuracy goes down to 55%. Code attached below. I double checked whether it was the torchattacks.FGSM(model, eps=0) that was wrong, by commenting it out and just re-checking with the original input data: my accuracy then was 100% again. Also, when trying to de-normalize the output image after torchattacks.FGSM(model, eps=0), I obtain an incorrect image which implies that my mean and std of the image have been changed: this implies that the image has been altered.

Which value of eps does not give any perturbation to my image?

def test_FGSM(epoch, model): """ Train the classifier and calculate loss """ train_loss, correct, total = 0,0,0 counter = 0 img_label_adv_advlabel = [] start_time = time.time() model.eval()

for batch_idx, (inputs, labels) in enumerate(testloader):
    # Get inputs
    inputs, labels = inputs.to(device), labels.to(device)
    inputs_img = inputs.squeeze().detach().cpu().numpy()

    # Get predictions based on original image
    pred = model(inputs) 
    init_pred = pred.max(1, keepdim=True)[1]
    init_pred = init_pred.squeeze()

    # # If the initial prediction is wrong, don't bother attacking, just move on
    if init_pred.item() != labels.item():
        counter += 1
        continue

    loss = criterion(pred, labels) #From original images

    model.zero_grad()
    loss.backward()

    attack_4 = torchattacks.FGSM(model, eps=0) 
    FGSM_output = attack_4(inputs,labels)
    FGSM_img = FGSM_output.squeeze().numpy()

    # Get predictions based on FGSM sample
    pred_FGSM = model(FGSM_output)
    final_pred = pred_FGSM.max(1, keepdim=True)[1]

    # Append original img, original pred_label, PGD img, final_label
    img_label_adv_advlabel.append((inputs_img, init_pred.item(), FGSM_img, final_pred.item()))

    train_loss += loss.item()
    total += labels.size(0)

    #correct += final_pred.eq(labels).sum().item()
    if final_pred.item() == labels.item():
        correct += 1
        #correct += predicted.eq(labels).sum().item()

execution_time = (time.time() - start_time)        
acc = 100* correct/total
print(len(testloader))
print('time (s): %.2f'%execution_time, 'accuracy', acc, 'counter= ', counter)
return acc, execution_time, img_label_adv_advlabel #, img_label_adv_advlabel
rikonaka commented 9 months ago

Hi @michellerosehb , I simply generated a test and found that there should be no problems with FGSM.

I guess there is something wrong with part of your code.

By the way, this code you provided is very difficult to read πŸ˜‚.

test

poiug07 commented 6 months ago

I encounter a similar problem when using PGDL2 with eps=0. The problem doesn't occur always but appears after some epochs of training.

For me exactly, it is causing adv_images to take nan values.

rikonaka commented 6 months ago

I encounter a similar problem when using PGDL2 with eps=0. The problem doesn't occur always but appears after some epochs of training.

For me exactly, it is causing adv_images to take nan values.

Forgive me for asking a question first, but why use an eps=0 attack? This represents the original image in a mathematical sense, so why not just use the original image? πŸ˜‚

Then about PGDL2, according to https://github.com/Harry24k/adversarial-attacks-pytorch/issues/161, there is a problem with the PGDL2 algorithm that he is still trying to fix. 😜 Perhaps you could wait until he's finished fixing it before trying the new code, or you could provide a copy of the code that can be used to reproduce the problem to help me find out what happened.

poiug07 commented 6 months ago

why use an eps=0 attack?

To check the correctness of the algorithm. Under eps=0, it should have the same accuracy as normal training.

Thanks for linking the issue. I will take a look. Even though I believe it should just sample 0 vector when eps=0.

rikonaka commented 6 months ago

why use an eps=0 attack?

To check the correctness of the algorithm. Under eps=0, it should have the same accuracy as normal training.

Thanks for linking the issue. I will take a look. Even though I believe it should just sample 0 vector when eps=0.

So far under my local testing (200 test images from CIFAR10), the PGDL2 algorithm does not have an abnormal attack success rate with eps=0. In fact, you can see from the MSE losses for the adv. example and the original example that they are both actually the same.

Regarding adv_images=nan during training, can you provide a copy of the code available to run that I can use for testing? I'm not sure right now if it's a problem with the training code or the torchattacks.

Test

I just ran a test on CIFAR10 using resnet18 and not once during the epochs from 0 to 100 rounds did PGDL2's accuracy exceed the anomaly. The reason it is not 0 is because there are misclassified samples in it.

Test