cleverhans-lab / cleverhans

An adversarial example library for constructing attacks, building defenses, and benchmarking both
MIT License
6.2k stars 1.39k forks source link

possible gradient underflow in FastGradientMethod/BasicIterativeMethod #350

Closed ricvo closed 6 years ago

ricvo commented 6 years ago

It seems that the perturbations computed by the FastGradientMethod (FGM) and BasicIterativeMethod attacks are sometimes equal to zero.

In a first preliminary test (not in cleverhans), this problem was due to the overconfidence of the network. The cross-entropy is thus very flat and the gradient very small, this created an underflow problem. We solved this passing to float64 precision in tensorflow. I noticed Cleverhans is using float32 (specified in several places in the code). I wonder if it is possible to specify easily to use float64 precision instead and/or if you think there is another workaround to that?

We are trying to produce FGSM adversarial examples for MNIST, for example using a simple fully connected architecture of 784-200-10. The accuracy on the trained network is initially around 96% on test set. It drops to 48% on the adversarials generated with FGM {params eps=0.1, ord=np.inf, clip_min=0, clip_max=1}. However we noted that 45% of the MNIST test set examples are not changed by the FGSM attack. Do you know if anyone else had the same issue?

goodfeli commented 6 years ago

Do you have a minimum working code example?

A common cause of this problem is passing probabilities where the attack expects logits. Functions used to compute probabilities, such as the softmax and sigmoid function, often have strong saturation.

It is also possible in theory for the cross-entropy function itself to saturate and maybe that's what you've encountered, but I haven't observed that being a problem in practice yet.

ricvo commented 6 years ago

Thanks goodfeli for the quick answer. I tried to pass logits instead of probs as you suggested, specifying the keyword 'logits' to the CallableModelWrapper. Unfortunately this did not solve the issue...

We trained a network of 784-200-10 with 100 epochs on MNIST training set. I also attach the weights if you want to give it a try... (W0.npy, b0.npy, W1.npy, b1.npy). X_test is the test set of MNIST.

[CODE] W0_np = np.load("experiments/W0.npy") b0_np = np.load("experiments/b0.npy") W1_np = np.load("experiments/W1.npy") b1_np = np.load("experiments/b1.npy")

W0 = tf.Variable(W0_np, dtype = tf.floatXX) b0 = tf.Variable(b0_np, dtype = tf.floatXX) W1 = tf.Variable(W1_np, dtype = tf.floatXX) b1 = tf.Variable(b1_np, dtype = tf.floatXX)

sess = tf.Session() sess.run(tf.global_variables_initializer())

def nn(x): x = tf.cast(x, tf.floatXX) h = tf.matmul(x, W0) + b0 h = tf.nn.relu(h) logits = tf.matmul(h, W1) + b1

probs = tf.nn.softmax(logits)

return logits

nn_model = CallableModelWrapper(nn, 'logits') fgsm = FastGradientMethod(nn_model, 'tf', sess) fgsm_params = {'eps' : 0.1, 'ord' : np.inf, 'clip_min' : 0, 'clip_max' : 1} fgsm_adv_test = fgsm.generate_np(X_test, **fgsm_params) [/CODE]

We generated 3 sets of adversarial examples: fgsm_ch_float32.npy, fgsm_ch_float64.npy, fgsm_tf_float64.npy. These are respectively the adversarial examples generated with cleverhans using float32 and float64 in the code above, and the last ones are adversarial examples generated using a code written in tensorflow with float64 everywhere.

We found performances to be quite different, both the examples generated with cleverhans show around 48% accuracy and the number of images that are not modified at all is around 45%. In the case of the examples generated directly in tf with float64 the accuracy drops to around 3% and all images are modified by fgsm.

I suspect that this is due to the fact that the cleverhans attacks use all float32, is there any workaround possible? We also tried to change to float64 in the attacks.py and attacks_tf.py but not much changed, I am sure we missed something somewhere. Also in doing so, we broke other attacks methods like for example CW. Thanks for any insights on this, Cheers, Riccardo

weights.zip

iamgroot42 commented 6 years ago

If switching to flaot64 didn't change anything, then isn't it most likely some other problem?

ricvo commented 6 years ago

Dear iamgroot42, thanks for the answer. Focusing on the fgsm algorithm, the percentage of not changed images with cleverhans (45%) and the accuracy (50%) are very similar to the code implemented directly in tf with float32. When switching to float64 in tf the problem is solved. That is why I believe the float32 is the issue.

Now, I managed to solve the problem changing float32 to float64 in attacks.py and attacks_tf.py, also I changed the tf.to_float (line 53 in attacks_tf.py) to y = tf.cast(tf.equal(preds, preds_max), dtype=tf.float64). In this way I generate adversarial examples that leads to an accuracy of 5.8%. That seems better now.

I wonder if there is an easy way to specify the desired accuracy in cleverhans. If not, this is something very easy to implement and would be very useful for the users. Now I need to change manually the library everytime I change accuracy, that is not ideal.. I think the best way of doing this would be having the user specify the type of x and y placeholders, agree?

iamgroot42 commented 6 years ago

Agreed. Letting the user specify a placeholder would be the best workaround for this.

goodfeli commented 6 years ago

The line you changed (53 of attacks_tf.py) isn't actually involved in this code snippet. y is provided by get_or_guess_labels in attacks.py

goodfeli commented 6 years ago

I reproduce this problem, but for both float32 and float64

goodfeli commented 6 years ago

BTW, if you want to make your placeholder, or any other kind of input tensor, just use the generate method rather than generate_np.

goodfeli commented 6 years ago

OK, I had to edit the code in a few more places to make everything float64. With everything successfully float64, I am able to reproduce the problem at float32 and then make it go away by switching to float64.

goodfeli commented 6 years ago

3.7M entries of the gradient on the input get rounded to 0 for float32, none for float64

ricvo commented 6 years ago

Yes, that seems to be exactly the problem I mentioned!

So, I understand from your answer you suggest me to use generate. I see that in this way I can provide my own placeholder for x, and actually I could also modify construct_graph to get an extra argument, the type of the x placeholder, in this way I do not need to specify in my external code the construction of the feedable and fixed graph. But I still see some problems:

  1. the feedable kwargs types are specified as float32 in the init of the Attack
  2. I am quite sure the tf.to_float standard behaviour is return float32 and there are some instances of this method in the code
  3. in some generate methods there is explicit reference to float32 (e.g. DeepFool, ElasticNetMethod, CarliniWagnerL2, SaliencyMapMethod..). Not sure how critical is to use float64 for these attacks, but using generate with a float64 placeholder might generate conflicts..?

I think maybe the best solution would be to pass a tf.type (maybe with a check for supported types, in the beginning only float32 and float64 I guess) to the Attack.__init__, at that point can be stored in self.type and reused when needed? (in all the mentioned places 1-2-3) Or how would you advice me to proceed?

goodfeli commented 6 years ago

I looked into this a bit more yesterday.

I didn't realize that the machinery for generate_np interfered with the machinery for generate so much. I'm pretty annoyed about that, and I think the solution might just be to remove generate_np.

Another potential solution would be to try to get TensorFlow to support a floatX feature similar to what Theano has. In the meantime we could make a cleverhans.floatX and then phase it out after TensorFlow adds it.

goodfeli commented 6 years ago

I'm going to start a conversation on cleverhans-dev

goodfeli commented 6 years ago

Update: it looks like generate_np doesn't actually interfere with generate, though generate_np itself doesn't support float64.

I suggest just sending small PRs like this one whenever you encounter something that doesn't support float64: https://github.com/tensorflow/cleverhans/pull/356

ricvo commented 6 years ago

Ok, sure

Thanks for the feedback, I will try to send some small PRs like the one you showed in the next days

ricvo commented 6 years ago

I can contribute on this, I will work on it and submit a pull request.. I have two questions, before I modify stuffs I want to be sure I fully understand the architecture so I will be able to properly test it:

  1. could you point me to a minimal working example with generate of the code I pasted above where i was using generate_np instead? (the very simple one-layer network just for understanding)
  2. Let's say I already have a tf.graph that goes from node x to node logits stored in a tf.session inside an my favourite object class of my own python module. Can I pass somehow this graph to cleverhans generate to generate adversarial examples for such a network? Or should I pass the whole function generating the tensorflow graph nodes from x to logits? (E.g. In my understanding for carlini wagner the graph is used to generate the output starting from a image treated as a variable self.output = model.get_logits(self.newimg) ) with fgsm seems to work instead. So in general it is not possible to just reference to the nodes I already possess and one should pass the whole graph generation function instead. Would there be possible to avoid this, and simply "augment" the graph I already possess in my python object session with the necessary nodes for the adversarial generation?
ricvo commented 6 years ago

Ok, I think for point 1 it was quite easy (correct me if there is a better way):

 def nn(x):
        return create_logits(x)
 sess = tf.Session()
 x = tf.placeholder(tf_dtype, shape=[None]+input_shape)
 nn_model = CallableModelWrapper(nn, 'logits')
 with sess.graph.as_default():
     attack = AttackClass(nn_model, 'tf', sess, dtypestr=dtypestr)
     x_adv = attack.generate(x, **params)
 x_adv_test=sess.run(x_adv, {x:x_test})

The question of point 2 still remains though..

npapernot commented 6 years ago

Regarding point 2, you would have to provide a model object from cleverhans.model.Model if you are going to call generate from the FastGradientMethod.

If you'd like to use the graph you already created, it is not recommended, but you can use the function that is called in the backend by FastGradientMethod, it is here: https://github.com/tensorflow/cleverhans/blob/master/cleverhans/attacks_tf.py#L23

npapernot commented 6 years ago

I am closing this for now, feel free to reopen if #395 did not completely address your issue.