A confusion about img size

cihangxie / DI-2-FGSM

Improving Transferability of Adversarial Examples with Input Diversity

MIT License

161 stars 38 forks source link

A confusion about img size #6

Closed qilong-zhang closed 5 years ago

qilong-zhang commented 5 years ago

Hi, I'm very intersted in your paper, but I have some problems about the img shape. Could you help me? In your code, the method input_diversity() will return padded or input_tensor. For padded, the code is

padded = tf.pad(rescaled, [[0, 0], [pad_top, pad_bottom], [pad_left, pad_right], [0, 0]], constant_values=0.)
padded.set_shape((input_tensor.shape[0], FLAGS.image_resize, FLAGS.image_resize, 3))

and from the code, we can see that the shape of padded is "batch_sizex330x330x3"

But for input_tensor

  batch_shape = [FLAGS.batch_size, FLAGS.image_height, FLAGS.image_width, 3]

  with tf.Graph().as_default():
    # Prepare graph
    x_input = tf.placeholder(tf.float32, shape=batch_shape)

and from the code, we can see that the shape of input_tensor is "batch_sizex229x229x3"

Why are they different? Is it ok?

cihangxie commented 5 years ago

Thanks for your interest in our work.

The input diversity is instantiated as a layer inside the network. So it is okay for input_tensor and padded_tensor to have different shapes

qilong-zhang commented 5 years ago

@cihangxie Hi, thank you for your reply! but I still don't understand this sentence "The input diversity is instantiated as a layer inside the network. " If the perturbation is generated from padded, then the perturbation's shape will be [330,330,3]。It's ok add this perturbation to the same shape(padded), but the original img is [299,299,3]。 How can you add to the original img? or when you test the accuracy of attack success, you using [330,330,3] img to calculate the result?

cihangxie commented 5 years ago

Oh, now i understand what you are asking.

You can refer to the line here: https://github.com/cihangxie/DI-2-FGSM/blob/master/attack.py#L137 where we take the derivative of loss w.r.t. the original image (299x299x3), so the generated perturbation is also of the size 299x299x3. In this manner, you can interpret it as we resize the perturbation back to the size 299x299x3.

qilong-zhang commented 5 years ago

@cihangxie Ok, thanks! That's what I guessed before.

zjysteven commented 4 years ago

@cihangxie Hi! I have another detailed question. Is the size after rescaling (330x330 in your implementation) is the proper input size of the network? In other words, the model is supposed to take 330x330 inputs according to its architecture, right?

cihangxie commented 4 years ago

The default input size should be 299x299. But for ImageNet trained models, their performance usually will not be degraded if inputs are padded to a reasonably larger size (but too large size will significantly hurt performance)

zjysteven commented 4 years ago

I see. Thanks for the clarification!