ain-soph / trojanzoo

TrojanZoo provides a universal pytorch platform to conduct security researches (especially backdoor attacks/defenses) of image classification in deep learning.
https://ain-soph.github.io/trojanzoo
GNU General Public License v3.0
271 stars 62 forks source link

[Defense] Randomized Smoothing need modification and some naive transformation methods #48

Open ain-soph opened 3 years ago

ain-soph commented 3 years ago

The current Randomized Smoothing is a generic method, that we use the averaged logits of samples from Gaussian distribution as the prediction result. However, according to Certified Adversarial Robustness via Randomized Smoothing, it uses a vote mechanism to detect outliers. So it's not like a general mitigation method (such as adversarial training and MagNet), but a input detector like STRIP.

And another thing is to add some naive transformation as defenses(rotation, random croping, brightness change). It seems these naive methods are very effective.

hkunzhe commented 3 years ago

And another thing is to add some naive transformation as defenses(rotation, random croping, brightness change). It seems these naive methods are very effective.

These defenses are effective when adding the mark is behind normal data transformations. However, when adding the mark is before normal data transformations #49, due to the generalization gain from data transformation, these defenses work not that well.

THUYimingLi commented 3 years ago

The current Randomized Smoothing is a generic method, that we use the averaged logits of samples from Gaussian distribution as the prediction result. However, according to Certified Adversarial Robustness via Randomized Smoothing, it uses a vote mechanism to detect outliers. So it's not like a general mitigation method (such as adversarial training and MagNet), but a input detector like STRIP.

And another thing is to add some naive transformation as defenses(rotation, random croping, brightness change). It seems these naive methods are very effective.

Hi, Ren Pang,

thanks for your great efforts in this useful tools. In our work 'Rethinking the trigger of backdoor attack' (https://www.researchgate.net/publication/340541667_Rethinking_the_Trigger_of_Backdoor_Attack), we have done some explorations about the pre-processing based defenses. We found that spatial transformations (e.g., flipping, shrinking) are relatively effective against (most of) existing standard backdoor attacks. However, classical color shiftting methods (e.g., brightness, contrast) are far less effective, especially when the trigger is visible. Besides, transformations involved in the data augmentation process will decrease the effectiveness of those (pre-processing based) defenses to some extent. You can find more details from our paper :).

ain-soph commented 3 years ago

@THUYimingLi I think I won't be able to add those spatial transformation methods recently...

And I will not change the order of adding marks as #49 illustrates. If adding the mark before the augmentation, the attacks will lose the watermark gradient information if they want to optimize them, which makes the current code structure not work.

THUYimingLi commented 3 years ago

@THUYimingLi I think I won't be able to add those spatial transformation methods recently...

And I will not change the order of adding marks as #49 illustrates. If adding the mark before the augmentation, the attacks will lose the watermark gradient information if they want to optimize them, which makes the current code structure not work.

I understand your concerns. It is just some simple suggestions. :)

hkunzhe commented 3 years ago

@THUYimingLi I think I won't be able to add those spatial transformation methods recently...

And I will not change the order of adding marks as #49 illustrates. If adding the mark before the augmentation, the attacks will lose the watermark gradient information if they want to optimize them, which makes the current code structure not work.

@ain-soph , Hi, kornia has similar APIs with torchvisions.transforms and supports differentiable data augmentations on GPUs, which makes it possible to integrate preprocessing into the training pipeline. Maybe you can try it if it's convenient.