IBM / Autozoom-Attack

Codes for reproducing query-efficient black-box attacks in “AutoZOOM: Autoencoder-based Zeroth Order Optimization Method for Attacking Black-box Neural Networks” ​​​​, published at AAAI 2019
https://arxiv.org/abs/1805.11770
Apache License 2.0
57 stars 22 forks source link

How to get performance of "initial success" attack as reported in the paper? #3

Closed ZiangYan closed 4 years ago

ZiangYan commented 5 years ago

Hi,

I'm recently playing with adversarial related topics, and this cool codebase really helps me a lot.

In Table 1-3 of the AAAI paper you report query counts and per-pixel l2 distortions for initial success, and in Section 4.3 (TPR and initial success) you state "...initial success, where an initial success refers to the first query count that finds a successful adversarial example". However I ran into some trouble in reproducing the performance of initial success attacks:

  1. If I modify L289-L296 of blackbox_attack.py to

           if l2 < o_bestl2 and self.compare(score, np.argmax(lab)):
                # print a message if it is the first attack found
                if o_bestl2 == 1e10:
                    #print("save modifier")
                    print("[STATS][FirstAttack] iter:{}, const:{}, cost:{}, time:{:.3f}, size:{}, loss:{:.5g}, loss1:{:.5g}, loss2:{:.5g}, l2:{:.5g}".format(iteration, CONST, self.eval_costs, self.train_timer, self.real_modifier.shape, l, loss1, loss2, l2))
                    self.post_success_setting()
                    lower_bound = 0.0
                    return nimg

    , do I get the correct initial success adversarial example?

  2. With the above modification, I ran

    python3 main.py -a zoo -d mnist -n 100 --m 1000 \
    --batch_size 128 --switch_iterations 100 \
    --init_const 0.1 --img_resize 14

    , and I got l2_avg=2.62 for the MNIST ZOO setting (per-pixel l2=2.62/784=3.34e-3), which is comparable with the performance reported in your paper (per-pixel l2=3.50e-3). However, for CIFAR10, I ran

    python3 main.py -a zoo_ae -d cifar10 -n 100 --m 1000 \ 
    --switch_iterations 1000 --init_const 10 \
    --codec_prefix codec/cifar10_2

    , and I got l2_avg=6.16 (per-pixel l2=6.16/(33232)=2e-3), which is too large compared with 8.74e-4 as you reported in the paper.

chunchentu commented 5 years ago

Hello,

My guess is that you didn't take the squared root of the l2 loss?

ZiangYan commented 5 years ago

I just use l2_avg from the main.py L191

ZiangYan commented 5 years ago

When we optimize the C&W loss, the l2 distance is not taken squared root (see https://github.com/IBM/Autozoom-Attack/blob/master/blackbox_attack.py#L141-L145 ). This is different from Equation (1) in your AAAI paper.

Which one should be correct, the squared root of l2 loss or just l2 loss?

chunchentu commented 5 years ago

(1) Equation 1 is a general statement. We provide more information about our selection of Dist in Section 4.1, which is the squared L2 norm.

(2) The distortion is reported using the conventional measure (L2 norm)

(3) Could you first run the original code on the github? Just want to check if you can get the results that agree with the paper. I am still looking into the issue you brought up but this might take some time since I suspect it might involve some problems with tensorflow.

p.s As I rerun the code, I can get similar numbers reported in the paper.

python3 main.py -a zoo_ae -d cifar10 -n 50 --m 1000 \
 --switch_iterations 100 --init_const 10 \
 --codec_prefix codec/cifar10_2
ZiangYan commented 5 years ago

Hi, thanks for the promote reply.

I've ran the original code with the suggested arguments:

python3 main.py -a zoo_ae -d cifar10 -n 50 --m 1000 \
 --switch_iterations 100 --init_const 10 \
 --codec_prefix codec/cifar10_2

The log file is here.

I got 1.1516 avg l2 (or per-pixel 1.1516/3072=3.74e-4), which is about the half of 8.74e-4 as reported in the paper. With this arguments, we perform 1000 ADAM iterations even if an adversarial image is already found, so I believe this is not the initial success case.

chunchentu commented 5 years ago

You can parse the log with just a few lines of code to calculate the initial attack distortion. From your log, I get 2.1690 ( 2.1690/3072 = 7e-4) which is similar to the number reported in the paper.

ZiangYan commented 5 years ago

Thanks! Now I'm able to reproduce this number (2.1690).

ZiangYan commented 5 years ago

Hi, I'm now able to reproduce ZOO+AE results on cifar10. When trying AutoZOOM+AE, I ran the following command

python3 main.py -a autozoom_ae -d cifar10 -n 50 --m 1000 \
 --switch_iterations 100 --init_const 10 \
 --codec_prefix codec/cifar10_2

, and the log file is here.

I use the following script to analyze the log file:

import numpy as np
import re
with open('../../Autozoom-Attack/logs/cifar10_autozoom_ae.log', 'r') as f:
    data = f.read()

l2 = np.array([float(re.findall('l2:.*', line)[0].split(':')[1]) for line in re.findall('FirstAttack.*', data)])
print('avg l2 (initial success): {}'.format(np.sqrt(l2).mean()))

iteration = np.array([int(re.findall('iter:\d+', line)[0].split(':')[1]) for line in re.findall('FirstAttack.*', data)])
print('avg queries (initial success): {}'.format((1 + iteration.mean()) * 2))

, and for initial success I got average l2 = 5.2548, average number of queries = 158.53.

Although the number of queries is much smaller than the number in the paper (158.53 v.s. 259.34), the per-pixel l2 is about 50% higher (5.2548/3072=1.7e-3 v.s. 1.15e-3).

Is this a normal fluctuation or I missing something (e.g., use incorrect arguments)?

BTW, for cifar10 ZOO and AutoZOOM+BiLIN, should I set img_resize to something like half of the original image size as in MNIST?

chunchentu commented 5 years ago

Hi,

Different input images and random seeds may give different results, but larger distortions with less queries are the right trade-off.