Open Embeddave opened 2 years ago
Hi @Embeddave Thank you very much for your proposal, let me take a closer look.
@Embeddave I think you have found a bug. The algorithm should be ok, but it works only on a single sample, which is also reflected but requiring batch_size=1
. If more than one sample is provided in x
and y
provided to generate
it only considers the fist sample.
I think we need to add a loop over all samples in x
to generate adversarial examples for all input samples.
@Embeddave May I ask you for a review and test of PR #1422 aiming to fix this issue?
Thank you for working on this. Replied on the PR -- I could review and test early next week if that would be ok
Hi @beat-buesser
I see now that might have gotten lost in the way I described the issue here. I'm sorry I was unclear.
You're correct that it's a bug (and confusing!) that the original code was written to only accept batch size of 1, and it's good you are fixing.
But to me, what is more confusing is that the attack specifically expects probabilities from the model. To me that's not very clear from the argument names, or the docstrings, or code.
At first I naively used an estimator that returns logits, and found that the attacks take much longer to run. In some cases I think it may fail silently, but I have not verified that.
I found that I needed to add a SoftMax
layer on to my torch
model to get the attack to succeed in any reasonable length of time. Which should have been obvious, from how the attack is derived in the paper, but it's not clear from the art
code.
It's clearer (to me at least) from the original implementation, where the variable is explicitly named y_prob
.
That's what I meant when I suggested possibly changing the parameter name or definition in the docstring. Another option may be to alias within the generate
function so it's explicity
y_prob = y
Even the tests just use the standard white-box model that returns "logits" instead of probabilities, as far as I can tell. This threw me off because I looked to the tests to determine whether I was running the attack correctly. If the intent of tests is to do more than a smoke test, and actually verify the attack succeeds, maybe a black box estimator should be used as a fixture? And/or a standard Estimator
that has been modified to return probabilities?
Whatever could be done to make clear that a model is needed that returns probabilities might be good for future users of the attack
Hi @Embeddave Thank you, that's very valuable feedback! I have added an additional commit to #1422 to integrate your proposal. What do you think?
Is your feature request related to a problem? Please describe. Currently the docstring for
Simba.generate
states thaty
should be class labels (as is true for mostart
attacks).https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/85ca10ff6b696dbaa66dcf179d0493b12ff594bb/art/attacks/evasion/simba.py#L104
But actually in this case the
SimBA.generate
method expects a (1, c) matrix of class probabilitiesDescribe the solution you'd like I understand there's trade-offs when trying to provide a consistent API, but it's definitely confusing that the docstring disagrees with what the code expects.
Maybe keep the argument name but change the docstring?
Describe alternatives you've considered
don't allow 'y' argument and just always use the
classifier
predictions directly Not clear to me clear what's gained by allowing a user to do exactly the same thing as what the attack class does in this case. I guess a user could provide different values for the probabilities if what theclassifier
gives is somehow not desired, say, rounded?Change the argument name to something else, say
y_probs
, to be completely explicit? It's also not great when an argument with a consistent name has different semantics depending on the class. Maybe all black box attacks could adopt this parameter.Not sure if either of those is better, just putting alternatives out there.