VinAIResearch / Warping-based_Backdoor_Attack-release

WaNet - Imperceptible Warping-based Backdoor Attack (ICLR 2021)
GNU Affero General Public License v3.0
112 stars 17 forks source link

About the detection of Neural Cleanse #8

Closed zeabin closed 2 years ago

zeabin commented 2 years ago

All of the pretrained model you provide have anomaly index smaller than 2 in Neural Cleanse. However, when I train more backdoor models with default setting on mnist, cifar10 and gtsrb and test the detection of NC, only models on mnist have small anomaly index, models on cifar10 and gtsrb have anomaly index larger than 3(on average). Is there any trick to train the backdoor model?

tuananh12101997 commented 2 years ago

Thank you for your interest in our paper.

Yes. For simple datasets like cifar10, mnist, and gtsrb, the warping mask plays an important role in creating a powerful backdoor attack. For example, the warping field should be concentrated (have high values) on the region of the object (i.e. the center of the image).

However, for now, we just use random warping masks. Therefore, the backdoor effects would be unstable against backdoor detection methods like NC.

There are two tricks that might increase the stability of backdoor effects: 1. Increasing warping strength $s$, or 2. Increase the warping grid size $k$. However, it might hurt the quality of backdoor images.