evaluating-adversarial-robustness / adv-eval-paper

LaTeX source for the paper "On Evaluating Adversarial Robustness"
https://arxiv.org/abs/1902.06705
250 stars 33 forks source link

road signs threat model #4

Open earlenceferns opened 5 years ago

earlenceferns commented 5 years ago

The document mentions that attacks on road signs might not be well motivated from a financial perspective. This may or may not be true, however, an attacker's goals are much more than just financial. Given how broad this document is aiming to be, I think the threats considered should be similarly broad. E.g., there could be financial motivation (e.g., competitor car manufacturers might want to make other car brands look bad for instance because they can't recognize certain types of signs or objects).

So I do not see the point of caveating this threat model with a statement that essentially reads as the threat not being important. I think a rewording would be more informative and useful. E.g., although road signs is an often-cited threat model, a larger goal in that area of work is to understand physical vulnerability of machine learning models that can have impact on physical systems (and then cite relevant work there). On that note, I also noticed that this document has almost nothing in the way of physical attacks. This area is an important line of research IMO, and I think future versions of the doc should include text describing that area -- happy to contribute. Including this makes sense coz the L-norms of physical attacks seem to be on average way bigger than what we see in digital attacks. Thus, it will serve as nice evidence that simply comparing L-norms is not an effective way of comparing the robustness of models because those values depend on context in which the model is applied.

ftramer commented 5 years ago

I agree that discussing physical attacks (and e.g., adversarial patches, the whole line of work on audio attacks, etc.) could be a great addition. As far as I know, there have been few defenses proposed in that space, so discussing some of the differences in evaluation requirements and threat models could be interesting. There is also reason to believe that defending against physically realizable attacks is easier. For example, adversarial patches seem to inherently need to be "overly salient" to generalize to arbitrary inputs and conditions, which makes them easy(ier) to detect (we tried to do this in https://arxiv.org/abs/1812.00292).

adamshostack commented 5 years ago

I want to amplify what @earlenceferns is saying. I have long argued that understanding attacker motivation is not needed to understand attacks, and is a source of avoidable error. [1] There are attacks for political reasons and "for the lulz." I would have argued that no car maker would ship code to cheat on emissions tests, and been as confident in that argument as to the argument that there's no financial motive to mess with signs.

Lastly, I can easily imagine striking cab drivers wanting to harm self-driving cars and having a financial motivation to do so.

My point is not against footnote 2, but against the evaluation of goals in general.

[1] https://adam.shostack.org/blog/2016/04/think-like-an-attacker-is-an-opt-in-mistake/ and more generally in my threat modeling book.

carlini commented 5 years ago

Response in two parts:

(1) Physical attacks. Definitely. I think this would warrant some discussion, but I don't want to make it a big focus just yet. This is mainly a paper on how to evaluate defenses to adversarial examples, not a paper on the research space of adversarial examples (even though there's definitely some creep in scope). Currently there are not many defenses that are designed to work only in the physical world. Maybe that will change in the future, but for now I want to focus on the common case and get people to do that well. If we start seeing a bunch of poorly performed security evaluations that are focusing on the physical world (and, for the record, Florian's evaluation is well done) then it would completely make sense to add much more content here. Most of the paragraphs are written because one of us saw some defense paper say something we wanted to correct.

(2) The footnote. So I'm the one who added that footnote. I added it mainly because basically every defense paper that is written and contains a motivation section says "self driving cars and stop signs!" and it bothers me. I didn't feel comfortable repeating this argument without saying why I don't like it. Maybe let me expand briefly on why I don't like it.

First, people usually mention this as an attack people actually would do in practice. I don't believe it. The only motivation for why someone would want to do this that I can see is for causing harm to people. Most people don't want to do this. There are far easier ways to cause serious harm to people driving cars than by putting stickers on stop-signs. Doing this is still illegal. If a cab driver wants to harm self driving cars they could throw rocks at them. (Okay, a bit of a straw-man, but you get the idea.) Again, still illegal.

(I could be wrong here---maybe there is some real motivation for these kinds of attacks that's not causing bodily harm to people? I don't believe in the kidnapping/"get off my lawn" person who doesn't want cars driving on their street threat models...)

I make the "financial motivation" comment because the vast majority of attacks happen only when there is a financial motivation to do so. And if a paper's argument is "this is well motivated" then one really compelling way to argue this is to show a financial motivation. People never really cared about attacking websites when all you could do was deface it to say "h4x0r3d". Once credit cards become prevalent, attacks start actually making sense. So if some paper is saying this is something that will happen in practice, I don't believe that's actually a good argument. The paper should pick something better, like NSFW detection or Florian's advertisement attacks, that actually might happen.

Now maybe someone is a luddite and wants to just cause mass harm to all self-driving cars. Or maybe the military wants to cause their enemies self-driving tanks to crash. These are definitely valid threats that are possible. I don't see them as worth including in a defense paper, however. (Just like in a security paper when you're justifying some new bounds checking defense you don't say "when we're at war the enemy military is going to exploit my system".) If you are trying to defend against someone who is out to cause your car to crash you have serious problems and they're not going to respect your threat model. Which brings me to ....

The next reason I really don't like this threat model is it's highly specific. As you all observe above, defending against physical attacks is a very different--and probable easier---challenge. If your actual objective is defending against stickers on stop signs then build a defense that stops that, not something which prevents me from changing a airplane to a frog with a l_infinity distortion of eps=8/255.

earlenceferns commented 5 years ago

First let's separate out personal opinions from scientific ideas. Whether any single individual believes something or not does not make it more or less real or more or less true or untrue. Second, the point about "people not wanting to harm other people" is strange. This is equivalent to saying that bad people (or as Adam says, people who are in it for the lulz) who simply want to do harm will somehow become saints when they see self-driving cars. Third, there might be some confusion in whether this is actually a threat model. It seems to be more of a specific attack, rather than a threat model. I think the actual threat model here should be "physical attacker" without the ability to digitally add/sub vector values.

Furthermore, There might be other goals than "simply" causing harm to other people. For instance, an attack might be to change the interpretation of a traffic sign, and then place some other objects strategically to make it like the other 3 roads at a 4-way intersection are blocked. If an adversary does this at strategically placed points in a planned route, they might be able to detour a vehicle to an area that an attacker controls. So, the point here is that mis-recognizing road objects might serve higher-level goals that we do not know about. Now, of course I agree with you that if a paper simply parrots the idea that "oh, put stickers on signs, and cars magically crash", and then goes on to describe a defense against digital attackers, that does not make a whole lot of sense, and future papers should be discouraged from simply using that as their only motivation (caveats below). This is something that bothers me as well. Or if they use it, they should provide a more nuanced explanation of why that is a credible threat. I believe that it is much more constructive to let them know the error of their ways, and maybe guide them to be a bit more explicit about what they are trying to do and why they are worried about stickers on stop signs (or stickers on objects in general). As Adam mentioned, understanding an attack today does not always require iron-clad motivations (that might become apparent in the future.)

So my suggestion for authors building defenses and talking about motivations:

  1. "self driving cars + sticker stop sign = crash" is not a threat model. It is an attack.
  2. No one has actually demonstrated that a car can crash if they misinterpret traffic signs, so do not write as if this is given. In fact, doing this is its own research question. If using this as motivation, state that you are concerned about physical attackers.
  3. Introduce the "physical attacker" as a separate kind of attacker (much like in classic security, we have a remote attacker, an insider attacker, etc). Now that I think about this, maybe this document would benefit from a listing of the types of attackers that could manifest in the context of ML attacks, and then maybe people evaluating defenses should mention what type of attacker they are defending against.
  4. Papers should use threat models appropriate to their work. E.g., if a paper is claiming defenses in some small L-norm value, then it most likely is concerned about digital attackers, and should not parrot the "self driving car + stop sign crash" as motivation.
carlini commented 5 years ago

Completely agree with all four of those suggestions.

(https://arxiv.org/abs/1807.06732 discusses point 4 in part. But including some more discussion wouldn't be bad.)

adamshostack commented 5 years ago

I also like @earlenceferns' suggestion, and would augment point 3 with discussion of capability vs motivation: an insider attack might be disgruntled or it might be an account takeover. Motives differ, possibilities to exploit granted permissions are the same.

earlenceferns commented 5 years ago

By the way, I forgot to mention, thanks for taking the initiative on creating the document. And I'm happy to help with any discussion in the text.

rahmati commented 5 years ago

@carlini There are two parts to your argument:

(1)

the vast majority of attacks happen only when there is a financial motivation to do so. And if a paper's argument is "this is well motivated" then one really compelling way to argue this is to show a financial motivation.

While I agree that "the vast majority of attacks happen only when there is a financial motivation to do so", it is not a general rule. Also, monetization doesn't need to happen immediately. For example, any attack that enables us to do privilege escalation or access memory we are not supposed to is valuable, regardless of its difficulty or whether it is "immediately monetizable" or not. Think Rowhammer, Spectre, ROP, ...

(2)

There are far easier ways to cause serious harm to people driving cars than by putting stickers on stop-signs. Doing this is still illegal. If a cab driver wants to harm self driving cars they could throw rocks at them. (Okay, a bit of a straw-man, but you get the idea.) Again, still illegal.

I think this argument is inherently wrong. The fallacy becomes more obvious if you rephrase this statement for a more acceptable threat model like card skimming:

There are far easier ways to steal credit cards than putting a skimmer on an ATM. Doing this is still illegal. If a theif wants to steal credit cards they could punch them and steal their wallet. (Okay, a bit of a straw-man, but you get the idea.) Again, still illegal.

carlini commented 5 years ago

@rahmati

(1) Definitely agree.

(2) Okay, maybe too weak of a strawman. I still believe the underlying point holds.

@adamshostack @earlenceferns

Writing something to this effect would be great. Fundamentally this is a paper on how to perform evaluations and I don't want to take the focus away from that, but insofar as the motivation for why the defense exists informs the way in which it should be evaluated, a discussion of this would be useful.

earlenceferns commented 5 years ago

@carlini Sounds great! I'll try to write something and send in a pull request, hopefully soon.
@adamshostack @ftramer @rahmati (tagging all commentators here) we should probably co-ordinate on this in case you are up for writing something.

adamshostack commented 5 years ago

@earlenceferns I'm happy for you to take the lead -- feel free to send me a draft for edit, tag me as a reviewer, or otherwise.

carlini commented 5 years ago

A somewhat early proof of concept that actually shows this type of attack on an actual production self-driving car could work. See Fig. 24 for the example images.

https://keenlab.tencent.com/en/whitepapers/Experimental_Security_Research_of_Tesla_Autopilot.pdf