[QUESTION] Low success rate for the Carlini-Wagner attack.

Harry24k / adversarial-attacks-pytorch

PyTorch implementation of adversarial attacks [torchattacks].

https://adversarial-attacks-pytorch.readthedocs.io/en/latest/index.html

MIT License

1.79k stars 337 forks source link

[QUESTION] Low success rate for the Carlini-Wagner attack. #171

Closed Adversarian closed 6 months ago

Adversarian commented 6 months ago

❔ Any questions

I didn't file this under [BUG] because I'm certain there's a problem with my use of the library rather than the library itself. I am try to create adversarial samples using the CW attack method but I am not having great success with it unfortunately.

I am using the CWBSL2 method from @rikonaka's PR which I think has the same underlying CW implementation as the base library only with an added binary search component.

Things I've tried:

Made sure the model outputs are logits. (they do not sum to 1 and the model ends in a simple dense classification head)
Fidgeted with hyperparameters.
Tested with the original torchattacks.CW as well.

I have a simple ResNet18 victim model on CIFAR10 which uses normalization. I have made sure to use the set_normalization method of the attack and my input images are sampled directly from CIFAR10 with only a ToTensor() transform applied on top of them to bring them to (0, 1) range.

Since my code is part of a private repository I cannot share my own code but I might be able to cook up a snippet to reproduce this if need be.

The robust classification accuracy is around 89%. This is while I'm getting 0.25 and 0.0 for FGSM and PGD respectively using the same exact piece of code.

rikonaka commented 6 months ago

Hi @Adversarian , I have conducted several tests so far, but I currently do not have a normalized model, so I cannot conduct related tests yet 🥲. My result is that CWBS (or CWBSL2) can achieve an attack success rate of 100% on 100 images.

atk = CWBS(model, init_c=1, steps=10, lr=0.01, binary_search_steps=10, abort_early=False)

False

And if I set abort_early=True

atk = CWBS(model, init_c=1, steps=10, lr=0.01, binary_search_steps=10, abort_early=True)

True

So far, the performance is normal. I will try the normalized model next 😨.

Adversarian commented 6 months ago

These were my hyperparameters although I did change them around quite a bit:

"CW": {
            "kappa": 14,
            "init_c": 1,
            "binary_search_steps": 10,
            "steps": 100,
            "lr": 1e-2,
}

I suspect it might have something to do with normalization but I'm still afraid I might be doing something wrong here. I will try to create a minimal reproducible example when I can.

rikonaka commented 6 months ago

These were my hyperparameters although I did change them around quite a bit:
"CW": {
            "kappa": 14,
            "init_c": 1,
            "binary_search_steps": 10,
            "steps": 100,
            "lr": 1e-2,
}
I suspect it might have something to do with normalization but I'm still afraid I might be doing something wrong here. I will try to create a minimal reproducible example when I can.

I just trained a Resnet18 model, and the attack success rate of CWBS is still 100% 😨.

model = load_model()
model.eval()
total = 0
success = 0

# atk = CWL0(model, c=1, steps=10, lr=0.01, abort_early=True)
# atk = CW(model, c=1, steps=10, lr=0.01, abort_early=True)
# atk = CWLinf(model, c=1, steps=10, lr=0.01, abort_early=True)
# atk = CWBS(model, init_c=1, steps=10, lr=0.01, binary_search_steps=10, abort_early=False)
atk = CWBS(model, init_c=1, steps=10, lr=0.01, binary_search_steps=10, abort_early=True)  # nopep8
atk.set_normalization_used(mean=(0.4914, 0.4822, 0.4465), std=(0.247, 0.243, 0.261))  # nopep8

with tqdm(total=len(test_loader), desc='Test') as tbar:
    for batch_idx, (x, y) in enumerate(test_loader):

        if total > 1000:
            break

        x, y = x.to(device), y.to(device)
        total += y.shape[0]
        adv_images = atk(x, y)
        adv_pred = model(adv_images)
        # printA(labels)
        # print(torch.argmax(adv_pred, 1))
        success += torch.sum(y != torch.argmax(adv_pred, 1))
        tbar.update()

success_rate = success / total
print("Attack success rate: {:.3f}".format(success_rate))

result

Maybe there are some problems in your code.

Adversarian commented 6 months ago

Thanks you for your time! I'm going to try to create a working example and send it here. Until then I will close the issue as your testing is sufficient evidence that something must be going wrong in my code.

Adversarian commented 6 months ago

Sorry for reopening this isse so early but one thing in your snippet jumped out to me. Can you please show me how you've constructed your testloader? I'm asking this because I'm seeing that you're passing adv_images to the victim model without normalization.

I've been normalizing the outputs of the attacks with the precalculated mean and std before passing them onto the victim model which I think is correct because the robust accuracy improves significantly after doing this which leads me to believe that the outputs of the attacks are not normalized. Am I making a mistake here?

Adversarian commented 6 months ago

Never mind, I think I see where I've been going wrong. I've assumed that set_normalization_used means that I will have to pass in unnormalized images to the attacker but after reviewing the code I understand that I have to actually input already normalized images to the attacker when using this and the attacker will return normalized outputs which I can pass into the model.

I will make these changes to my code and hopefully this time the issue will stay closed 🙂.

Adversarian commented 6 months ago

I'm here to report that my issue was successfully resolved. I'm leaving this comment here in case anyone else finds themselves in a similar situation as me. If you've migrated to Torchattacks from Foolbox, they exhibit different behaviors with regards to normalization which is why I was struggling with this in the first place.

When you use the set_normalization_used method of an attack on TA, the attacker expects an already normalized input. Here's what happens when you pass a set of images to the attacker when set_normalization_used is called:

Inputs are brought to the original distribution through an inverse normalization.
Attack is performed and adversarial images (adv_images) are obtained.
adv_images is clamped to the (0, 1) range.
The result is normalized again with the mean and std you set when calling set_normalization_used.
The result is returned.

So essentially, when set_normalization_used is called, the model expects an already normalized input whose inverse normalization lies in the (0, 1) and it returns a batch of normalized adversarial images which you won't have to later normalize again before inputting to your victim.