carlini / nn_robust_attacks

Robust evasion attacks against neural network to find adversarial examples
BSD 2-Clause "Simplified" License
778 stars 229 forks source link

l0 attack: Potential Bug #20

Closed ankur6ue closed 6 years ago

ankur6ue commented 6 years ago

Apologies if I'm missing something obvious here, but in the l0 attack, shouldn't the valid[e] = 0 be after the breaks?

If set a pixel to "don't change", if 1. totalchange < threshold and 2. we haven't changed too many pixels.. setting valid[e] = 0 before the breaks would invalidate the pixel regardless?

did = 0 for e in np.argsort(totalchange): if np.all(valid[e]): did += 1 valid[e] = 0

                if totalchange[e] > .01:
                    # if this pixel changed a lot, skip
                    break
                if did >= .3*equal_count**.5:
                    # if we changed too many pixels, skip
                    break

Also, you haven't implemented the random starts in the l2 implementation in this repo correct? The paper says:

We randomly sample points uniformly from the ball of radius r, where r is the closest adversarial example found so far.

This r depends on the source/target label for a given image? i.e we chose a r based on the closest adversarial example for the target class under consideration (r would vary significantly depending on the target class, as adversarial examples for some classes are harder than others)? What initial value did you pick?

carlini commented 6 years ago
  1. We want to set at least one pixel on each iteration of the loop to be fixed. The abort criteria are there so that we don't change more than one unless it looks likely we haven't made a substantial change. If it was after, then we might enter an infinite loop where we never set decrease the size of the valid set.

  2. Correct, this doesn't implement the random restarts. Initially, solve with no random start, and find the nearest adversarial examples. Then, on future iterations, pick a random perturbation with magnitude less than the best solution found so far (for this exact adversarial example attempt, which yes, depends on source/target class).