Address reviewer comments

defuse commented 8 years ago

Dear authors,

The 9th USENIX Workshop on Offensive Technologies (WOOT '15) program
committee is sorry to inform you that your paper #17 was rejected, and will
not appear in the conference.

       Title: Distinguishing Inputs with the FLUSH+RELOAD Cache Side
              Channel
     Authors: Taylor Hornby (University of Calgary)
              John Aycock (University of Calgary)
  Paper site: https://woot15.usenix.hotcrp.com/paper/17?cap=017a1_cQ9NL6QjM

20 papers were accepted out of 57 submissions.

Reviews and comments on your paper are appended to this email. The
submissions site also has the paper's reviews and comments, as well as more
information about review scores.

Contact woot15chairs@usenix.org with any questions or concerns.

Aurélien and Thomas,
WOOT '15 PC co-chairs

===========================================================================
                           WOOT '15 Review #17A
---------------------------------------------------------------------------
 Paper #17: Distinguishing Inputs with the FLUSH+RELOAD Cache Side Channel
---------------------------------------------------------------------------

                      Overall merit: 4. Accept
                 Reviewer expertise: 4. Expert

                         ===== Paper summary =====

In this paper, the authors describe a method that use Flush+Reload to distinguish between a set of possible inputs. They do so by profiling the order in which functions are called by applications, for different inputs. The method comprises three steps: a training phase (with a set of inputs), an attack stage and the recovery stage. They thus show that cache attacks can be used to compromise the privacy of users, when shared memory is enabled.

                      ===== Comments for author =====

* Pros
- Cache attacks are gaining momentum on the last few years, it is interesting to see how broad these attacks can be, and in particular what kind of attacks can be performed. 
- The authors performed three different experiments (on Links, Poppler and TrueCrypt) to show that their approach is generic and applies to different applications.  

* Following Remarks/Questions on Genericity
- For the Links (resp. Poppler) experiments, 100 (resp. 127) is not a big training set. I would have liked to know how the recovery behaves when the training set grows.
- Why Links and not a more popular browser? Is it linked to the size of the binary (and thus the number of functions to chose from)? You should clarify this in the paper.
- You also didn't explain your choice of only selecting cache lines that correspond to function entries only. Selecting between all the addresses accessed by the binary might give you more fine-grained information. You also could attack any binary, regardless of having the symbols.

* Related Work
There are really interesting and (also really) concurrent works to yours. I suggest you to cite, of interest and directly related to your paper:
- "The Spy in the Sandbox -- Practical Cache Attacks in Javascript", by Yossef Oren, Vasileios P. Kemerlis, Simha Sethumadhavan, Angelos D. Keromytis. In http://arxiv.org/abs/1502.07373
While using Javascript and not native code, Oren et al. use a cache side channel to perform a  mouse/network activity logger, directly impacting the privacy of users.
- "Cache Template Attacks: Automating Attacks on Inclusive Last-Level Caches", by Daniel Gruss, Raphael Spreitzer, and Stefan Mangard. In Usenix Security 2015 (https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/gruss)
Gruss et al. designed a generic technique that profiles cache-based information leakage for any binary.
If the paper is not yet available at the time you get this review, you can contact Daniel Gruss who will happily give you a copy of the paper. Their code is here and is documented so that you can see the relevance to your work and what they did: https://github.com/IAIK/cache_template_attacks

* Minor remark
Table 1: "unified" is usually used to describe a level that contains both data and instructions, not to describe a "shared" cache between cores. The CPU on System 2 "Xeon E3-1245 v2" has neither a L1 that is shared or unified. A clearer way to write the specifications would be along these lines:
  "L1: 4x32 KB data + 4x32 KB instructions"
  "L2: 4x256 KB unified"
  "L3: 8 MB shared unified"
Same for the Intel Core 2 Duo P8700:
  "L1: 2x32 KB data + 2x32 KB instructions"
  "L2: 3 MB shared unified"

You could also write the number of cores of each processor to make it clearer if need be.

===========================================================================
                           WOOT '15 Review #17B
---------------------------------------------------------------------------
 Paper #17: Distinguishing Inputs with the FLUSH+RELOAD Cache Side Channel
---------------------------------------------------------------------------

                      Overall merit: 2. Weak reject
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

This demonstrates using the FLUSH+RELOAD technique for one process to spy on the code execution pattern of another by timing cache accesses.  Most work in this area uses such side-channels to attack crypto.  This paper demonstrates distinguishing non-crypto program behavior, such as which website was loaded in a browser.

                      ===== Comments for author =====

This is a nice demonstration of the FLUSH+RELOAD attack, the methodology for preparing and training the attack is interesting, and it's interesting to see what information can be extracted from different apps.

However, it's not clear how exactly this improves on "Cross-Tenant Side-Channel Attacks in PaaS Clouds" from ACM CCS 2014, which is not cited.

That paper uses FLUSH+RELOAD to recover non-cryptographic data from a shopping app (the number of items in a shopping cart).  So it's not true that "this is the first time that a generic cache-based side channel has been used to compromise privacy by attacking a non-cryptographic application".  

Moreover, that paper demonstrates the attack in a shared-VM environment targeting server apps, which is probably more practical and difficult.

The authors should clarify this paper's relationship to prior work.

===========================================================================
                           WOOT '15 Review #17C
---------------------------------------------------------------------------
 Paper #17: Distinguishing Inputs with the FLUSH+RELOAD Cache Side Channel
---------------------------------------------------------------------------

                      Overall merit: 3. Weak accept
                 Reviewer expertise: 2. Some familiarity

                         ===== Paper summary =====

Here we are presented with a noncryptographic application of the FLUSH+RELOAD attack. The authors use cache timing to detect, across user accounts in the same machine, which of many similar PDF files has been opened, which Wikipedia link has been accessed using `links` and, in a more cryptographic application, whether a TrueCrypt volume is a hidden volume. Thea authors also devise a semi-automated method to find good cache lines to FLUSH-RELOAD on.

                      ===== Comments for author =====

I liked this paper. It was well-written and well-motivated. While this class of side-channel attack is of course not new, this is an interesting application of it that can end up being useful in an elaborate attack on, say, cross-VM scenarios.

Which leads to the question: why was the cross-VM scenario not explored here? Was it for convenience reasons, like it is pointed out for the same user/different user scenario, or does the presented method not work there? I would advise to clear this up. (The cross-VM scenario is treated at the very end, but it is in the context of shared _pages_, which is a leakier side-channel.)

One nitpick in Section 7: the authors claim that this is the first noncryptographic exploration of cache-timing attacks. But in the very next sentence they point out another prior cache-timing attack which is also noncryptographic. Is the claim specifically on _usermode_ cache timing? If so, I would advise the authors to make the claim clearer.

In the same section, the authors list a number of prior side-channel attacks against various software. Another interesting instance of such an attack is pakt's (not cache-) timing attack on JavaScript hash tables [*], which can be used to bypass ASLR by leaking an object's pointer.

[*] https://gdtr.wordpress.com/2012/08/07/leaking-information-with-timing-attacks-on-hashtables-part-1/

defuse commented 8 years ago

Those reviews are actually way more positive than I remember!

defuse commented 8 years ago

I think as a matter of strategy we just shouldn't claim to be the first of any type of attack. It's too easy for us to miss concurrent work or something published on some random blog, and get rejected for that reason, even though our attacks are good. Let's just motivate it something like this:

"There's a lot of focus on crypto attacks, here's some work that attacks non-crypto stuff. The simplest kind of attack is an input distinguisher (all other kinds of attacks, e.g. crypto key leakage and the shopping cart one) imply the existence of input-distinguishing attacks. We found some input distinguishing attacks so that suggests more might be possible."

defuse commented 8 years ago

Better rationale:

There's a lot of focus on crypto attacks not much on non-crypto (with these exceptions [1,2,3...]).
Input distinguishing attacks are a prerequisite to all kinds of attacks (crypto attacks imply key distinguishing, and all of [1,2,3,...] imply distinguishing attacks), so defending against those attacks implies defending against input distinguishing attacks.
Input distinguishing attacks suggest there might be even better attacks.
We could have tried to find ad-hoc attacks (like the TrueCrypt one, or some spy-on-texteditor-keystrokes thing) but then we would only be learning about those ad-hoc attacks, not learning things about a whole space of attacks, which we are learning about by trying to automate the process of input distinguishing.

defuse commented 8 years ago

The TrueCrypt one is actually not ad-hoc as it can be seen as distinguishing between classes of inputs, which implies being able to distinguish between two specific inputs.

Any F+R attack can be constructed by repeating different kinds of input-class-distinguishing attacks. Take the attack and encode its output (the leaked information) into N bits, then define N pairs of input classes where the i-th pair is ("inputs with i-th bit 0", "inputs with i-th bit 1"). So there is (mathematically, who knows about practically) always a progression from input class distinguishing attacks to the attack you're actually going for. This is uninteresting except that (1) If the progression exists in practice not just in math land, then it leads to easily turning distinguishing attacks into full leakage attacks and (2) A defense that promises to break such progressions (e.g. by making which input class is being measured unknown) will break the full leakage attacks even though it won't break the individual distinguishers, which could be good enough.

defuse commented 8 years ago

I opened tickets for each of the individual comments.

defuse / flush-reload-attacks

Address reviewer comments #2