Discrepancy among AdversarialPatch* attacks for the same parameters

Trusted-AI / adversarial-robustness-toolbox

Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

MIT License

4.8k stars 1.16k forks source link

I have received an external anonymous report form a user of ART that there seems to be a discrepancy among the Numpy, TensorFlow and PyTorch versions of the Adversarial Patch attack.

Experiment generating three patches using the same data and hyperparamters and the same models as much as possible but slighlty different model parameters because they were trained in their respective frameworks.

All patches generated are effective when applied but the patches generated by AdversarialPAtchPyTorch generates patches that look different compared to the other two patches

It looks like the AdversarialPatchPyTorch patch is in a different domain than the other two patches because it show no gray pixels and has significantly more primary colours. The AdversarialPatchPyTorch attack does not generate the patch at full image size and returns a gray patch instead for this scale.

Created adversarial patches using each framework, keeping data, model and hyper parameters fixed (as far as possible).

Model: Resnet50 (torchvision and keras) target_name = 'goldfish, Carassius auratus' image_shape = (3, 224, 224) or (224, 224, 3) clip_values = (0, 1) or (0, 255) nb_classes = 1000 batch_size = 16 scale_min = 0.4 scale_max = 1.0 rotation_max = 22.5 learning_rate = 5000. max_iter = 500

All patches are effective and have some small discrepancies between each other visually, but not alarmingly so. Also able to generate an adversarial patch using AdversarialPatchPyTorch at full image size (3, 224, 224).

Notebooks with generated patches for each framework:

Trusted-AI / adversarial-robustness-toolbox

Discrepancy among AdversarialPatch* attacks for the same parameters #1663