VinAIResearch / Warping-based_Backdoor_Attack-release

WaNet - Imperceptible Warping-based Backdoor Attack (ICLR 2021)
GNU Affero General Public License v3.0
111 stars 17 forks source link

Issues about attack privileges #13

Open xcatf opened 1 year ago

xcatf commented 1 year ago

I'm sorry, I have some questions to ask.

In the WaNet paper, it is mentioned that attackers can control the model's training process, but WaNet seems to only require poisoning of the training set (by mixing "attack" and "noise" samples into the training set) to complete the attack. So, is WaNet a poisoning attack or an attack that controls the training process?

I also noticed that in the WaNet code, when generating poisoned samples, it selects num_bd+num_cross clean samples from each batch in the dataloader. However, the shuffle parameter in the dataloader is set to True, which means that the order of batches will be shuffled in each epoch, so the first num_bd+num_cross clean samples in each epoch are not the same, resulting in different sets of poisoned samples generated in each epoch. If a fixed set of poisoned samples is selected for each epoch, would the WaNet attack still be effective?

Looking forward to your reply!

anhttran commented 1 year ago

Many thanks for your interesting question.

When we developed the paper, we only focused on attacks with full control on the training process. However, I agree that with some modifications, the work can be adopted to poisoning attacks.

I have done a quick test on CIFAR-10, in which I fixed the images to be poisoned or noised during training. The attack still succeeded with the desired clean accuracy and ASR.

You can modify our code or check other toolboxes for the poisoning attack versions of our work. Some example toolboxes I have found (but not verified): https://github.com/THUYimingLi/BackdoorBox https://github.com/vtu81/backdoor-toolbox https://github.com/SCLBD/BackdoorBench

I hope this helps to answer your question. Best regards, Anh

xcatf commented 1 year ago

I am very happy to receive your response.

Based on your suggestions, we have reproduced two fixed index WaNet attack methods:

(1) Set the shuffle parameter of dataloader in dataloader.py to False. This ensures that the generated dataloader does not shuffle the data order for each epoch, thus ensuring consistency in poisoning samples throughout each epoch. (2) Add an index field to each sample in the CIFAR10 dataset. Pre-generated indices for backdoor samples and noise samples are used. Poisoned samples are only generated when encountering backdoor sample indices or noise sample indices, ensuring consistency in poisoning samples.

In the two fixed index reproduction methods you mentioned: The current results have left me very perplexed. The fixed-index attack results of WaNet on MNIST, GTSRB, and CelebA have all achieved the expected outcomes: MNIST: 99.44% GTSRB: 98.58% CelebA: 99.77% CIFAR-10:94.99% (unexpected)

To ensure that the issue is not with my CIFAR-10 dataset, I tried several attacks on CIFAR-10 (use fixed index method (1) and (2)): BadNets: 96.13% Blended: 98.67% ISSBA: 99.99% WaNet (fixed index): 94.99%

You can try changing "shuffle=True" to "shuffle=False" in the line "dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt.bs, num_workers=opt.num_workers, shuffle=True)" of your code dataloader.py. With this change, you will get results similar to mine in CIFAR-10.

However, when I used WaNet without fixed indexing, meaning with the dataloader shuffle set to true: WaNet (no fixed index): 99.26%

Currently, I am unable to achieve the desired performance of ASR in WaNet under fixed poisoning on CIFAR10. I would like to know the approach of fixed index you used to achieve the desired on CIFAR10.

Looking forward to your response!