globaleaks / GlobaLeaks

GlobaLeaks is free, open source software enabling anyone to easily set up and maintain a secure whistleblowing platform.
https://www.globaleaks.org
Other
1.21k stars 267 forks source link

Prevent timing correlation attack on notification, by adding a random delay #264

Open fpietrosanti opened 11 years ago

fpietrosanti commented 11 years ago

Prevent timing correlation attack on notification, by adding a random delay.

The random delay for notification should be adjusted in a way that make it difficult enough to make correlation attack (the system should still be usable).

The admin must be able to configure the random delay pattern, even to very paranoid one (sending an email maybe every 2-3 days).

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

vecna commented 11 years ago

easyest solution: every time a notification email need to be sent, simply make a percentage check. (100 / [ receiver_number * 2 ] ) % happen, send the email.

fpietrosanti commented 11 years ago

I think it will need to be configurable with a random delay, per tip, with two configurable value: lower/higher threshold. The delay would happen randomly within this delay. All receiver need to be notified at the same time, to avoid advantage of some receiver respect to other one in case they are competing each other

hellais commented 11 years ago

I think this number is something that should be figured out by somebody that has knowledge about mix networks. I would suggest writing a post to a cypherpunk mailing list asking for what is the magic number or algorithm.

I think it should be based on reaching a certain threshold of messages to be delivered or some mix of that.

fpietrosanti commented 11 years ago

I think that this feature, like the widget, it's something that does not need to have particular cryptographic properties given the need to make it simple and usable in that specific context (for example i would avoid sending notification to different receivers at different time, due to specific whistleblowing initiative related constraint).

The goal is not to protect receiver but only to avoid a third party observing receiver's notification traffic, to have an immediate understanding on when a specific action by a specific whistleblower has been done.

Anyhow a post on cypherpunk is always welcome to get comments on that!

hellais commented 11 years ago

I would implement this feature at the same time we implement a digest and send 1 email per day at a fixed time to every receiver containing the list of leaks received in that day or if no leak happened an email stating that no new leaks are available.

fpietrosanti commented 11 years ago

I think that the fixed timing approach would provide a usability issue and would not allow fine tuning of the security features depending on the context of use.

A specific context may require to have a minimum delay of 2 days or a maximum delay of 2 hours (to handle quasi-interactive comments notification exchange).

hellais commented 11 years ago

@fpietrosanti having the notifications always happen at a fixed time is the only way we can be sure that it actually provides extra security. Anything else (unless it's a smart trick that is provably secure) is a broken solution that provides no extra security, so we should not allow it or have it be configurable.

If you want to avoid timing correlation based on notification timing you will impact your user experience and that is not avoidable.

fpietrosanti commented 11 years ago

Timing correlation attack depends strictly on the whistleblower context and any kind of variant (being 1seconds or 10 days) do provide protection against time correlation.

The only relevant point is to avoid a third party, observing the receiver notification traffic and having "some way to infering/observe some whistleblower timing pattern", to correlate those two pattern.

The amount of time depends on the whistleblower context and cannot be a fixed timing. For example a fixed timing of 1 day could means that a whistleblower will get caught, depending on the whistleblower timing pattern.

Example Scenario: An initiative is soliciting whistleblower of a specific organization that have tight access control and camera where the information to be leaked is. The receivers are subject to monitoring due to the media hype around the initiative. There is fixed timing with digest, 1 notification per day. A submission is sent by a whistleblower. A notification is sent at the end of the day at fixed timing. The big brother observing the notification will know that the whistleblower made a submission "today" . The big brother will review corporate access-control records and security camera and will make a list of suspects to be interrogated.

This is the reason why fixed timing is broken and only variable, configurable, random timing between two boundaries (lower and higher) depending on the context should be implemented.

In a scenario like this a notification random timing lower boundary would require to be of at least few days.

hellais commented 11 years ago

@fpietrosanti in the above described scenario, given that the notification is implemented using the digest in the way I have above described, will not lead to the whistleblower being identified as having done a submission today, since a notification is issued every day weather a submission is done or not.

I think the above scenario is a good example scenario to benchmark other possible solutions to this problem.

Can you come up with another mechanism that doesn't impact the WB in the above scenario (that I find quite realistic), but does not impact usability?

Another thing to add to the notification emails is padding so that the email payload is always fixed in size.

fpietrosanti commented 11 years ago

@hellais in the scenario described the digest will not protect from time correlation because the content of the notification is supposed to be subject to monitoring by the big brother, so he will know if there was a submission or not. I think that you are assuming to have this feature only if email are encrypted (and with additional padding). I am assuming to do it independently from encryption and padding, assuming those email are monitored.

In a scenario like this one a lower threshold for notification would be of X (let's say 7?) days and higher threshold of Y (let's say 15?) days. If it was italy, surveillance camera can keep records for 24 hours or, by exceptional authorization, for 7 days: http://www.garanteprivacy.it/web/guest/home/docweb/-/docweb-display/docweb/1003482 In that specific context a notification could lower boundary could be 7 days, to be sure that video recording has gone.

I think that from the usability point of view of the receiver it's best to receive a couple of email per month, rather than 30 emails of which only 2 are relevant (because contain a notification).

hellais commented 11 years ago

If the content of the email is not encrypted or the receivers are rogue, there is nothing we can do.

To prevent attacks related to that we would have to implement delays also in the creation of tips and receiver tips in particular. Otherwise a rogue receiver can just refresh their tip page until they see a new submission appear and boom they now have the exact timing of when a submission was done.

Even if you implement the threshold (unless you delay also the tip creation) you still loose.

I think it is reasonable to assume that the receiver emails are encrypted and that they are not rogue since the WB has decided to trust them with their data.

fpietrosanti commented 11 years ago

Speaking of the assumption, if the email of receiver are monitored, it does not means that the receivers are rogue. To be a rogue receiver, the receiver must have it's own password to access globaleaks to be compromised.

A receiver can have a normal email account on an ISP where LEA have wiretapping capabilities, even if the receiver is using Tails and accessing securely the globaleaks platform.

I think that constant timing / digest is not a good way to go because it add effort and complexity on receiver side, to provide a specific protection on whistleblower side:

The goal of the digest is to aggregate notification only if there are many notification, not if there's no notification.

fpietrosanti commented 11 years ago

After a discussion among the team, we come up to the conclusion that this is a problem that we cannot easily determine.

The next step is to write an email to some cipherpunk mailing list, by formulating the problem in a more structured way.

fpietrosanti commented 11 years ago

We will also write to zookoo asking for his support about this problem.

tomrittervg commented 11 years ago

So from a Mix perspective, I would recommend reading, or skimming, From a Trickle To a Flood (http://www.freehaven.net/doc/batching-taxonomy/taxonomy.pdf) - it talks about several types of pooling algorithms.

As I understand it, you want a leaker to submit something, and then an admin to get an email notification that something happened. The goal is that if an attacker is watching a number of users, they will see the notification to the admin go out, and the attacker will go and query their traffic logging database: "Who sent data right before this?" to do a correlation attack.

I've seen a few other threat models mentioned, things like rogue receivers. If protecting against them is a goal, you should outline it - for now I'm going to talk about two scenarios: the one above where the attacker CANNOT read the email to the admin (only see that there was an email), and the one where the attacker CAN read the email to the admin.

If we assume the attacker CANNOT read the email (because it is PGP encrypted, sent over TLS SMTP links, or for other reasons), then you can get very strong security by sending (as was suggested above) a fixed-length digest email out every N hours. (N=24, 12, 6, configurable). It's always X bytes long, whether those bytes are "No new items [random padding]" or "Item 1: xxxx Item 2: xxxxx 4 more items, login to view"

If the we assume the attacker CAN read the email, then the pooling algorithms described in the paper come into play to some degree. However, mix networks are based around mixing - mixing together n messages with m outputs and not being able to track which went where. You're not mixing. 2 inputs have 1 output. n inputs have 1 output. You want to disguise the correlation between any submission and the corresponding output. So I don't think 'mixing' actually helps you. Would a delay?

Imagine a random delay, even from internal [1 second to 1 hour]. This gives the attacker a maximum window to search through - 1 hour. You can measure this security property very clearly. (Increasing the min window doesn't really buy you anything, I don't think, unless you model that the attacker is bounded in storage space, but I would not model an attacker thusly.)

Imagine a percentage based. Every hour on the hour, any new message has a 50% (or N%) chance of being sent. Well, I'm bad at statistics, but you can model the percentage confidence you have that an alert you observe was submitted in the past 1 hour window, 2 hour window, 3 hour window etc. And I also think in the 0-1 hour window, 1-2 hour window, 2-3 hour window, etc. Someone good at stats should help you with that.

Having some idea that GlobaLeaks is designed to be as flexible as possible, it seems prudent that you implement flexibility. Right now, off the top of my head, I'm thinking:

Those are my thoughts anyway. Getting more input is definitely recommended. =)

fpietrosanti commented 11 years ago

Hey @tomrittervg , thanks a lot! Your inputs are very valuable!

To give you some more input on the assumption:

Some consideration we have done on the two method (fixed time sending/digest vs random delay between lower/upper boundary):

Should we follow the random delay, but adding to the algorithm somehow the variable on "how may submission / how often the globaleaks node receive a submission" ?

hellais commented 11 years ago

In order to better evaluate this feature I am going to try and formalise the problem a little bit and suggest some possible adversary models we are interested in mitigating against.

Overview

When a whistleblower submits to a globaleaks node all receivers that they have selected as recipients for their submission will receive a notification informing them that a new submission has occurred. Other whistleblower interactions also trigger a notification (that should therefore be protected from timing attacks) and such interactions are:

Goals

We are interested in mitigating correlation attacks based on the dispatching of notifications for interactions performed by a whistleblower. It should not be possible (or harder) for an attacker to determine which person is a whistleblower for a certain submission based on their capabilities (more on that below).

Adversary model A

Their goal is to find which user has performed a certain submission on a globaleaks node.

This adversary has the following capabilities:

Adversary model B

This adversary has all the capabilities of the above adversary, but they do not have the ability of reading the content of the notification messages.

Adversary model C

All of the above except the receiver is not trusted: their goal is to de-anonymise the WB.

?? Question:

Is this any different from Adversary A, that is an adversary that has the ability to read the notification emails because they are not encrypted?

Example real world scenario

The GL node is a GL node for a private company. The adversary is a Manager of The Company that wants to find out who blew the whistle on the fact that he is recycling money through a shell company in the island of mann. Since they are on the receiver list, because the globaleaks node was configured to have a plurality of receivers, they will be able to read the content of the notification emails.

The whistleblower decided to blow the whistle on from their office and the office network has a proxy that logs every HTTP request being done.

When the Manager receives a notification that a new submission has been done to the globaleaks site they take the timestamp of such notification and look at the traffic logs for that period of time to see who was generating traffic during that period of time and based on this they should not be able to distinguish normal Tor user (or people loading the cover traffic widget) from the whistleblower.

@fpietrosanti do you agree with this model?

Do we want to protect against all of these adversaries? Are there other possible adversaries we should protect against?

Implementation notes: If we implement this feature and we are under the assumption that receivers are not to be trusted the random delay should also be applied to the creation of tips. That is the tip list API should not return the newly submitted tip unless some amount of time has passed (the same delay that was applied to the notification scheduling).


@tomrittervg thank you for the comments and the suggested paper.

Another thing worth considering is the fact that we have implemented a cover traffic widget. Currently this widget is very simple and just sends requests spaced with a random interval and with random sizing.

I believe there is a smarter way of doing this widget and perhaps there is a way of tuning it so that it works best with the configured anti timing correlation method. I wrote down some notes on how I would improve the widget on this ticket.

Specifically on the topic of notification security if we have the ability of running any software on the receivers machines and we assume that receivers are trusted (that is their goal is not to de-anonymise the whistleblower) we could have them run a software on their machines to dispatch notifications.

This software would pull from the GlobaLeaks node /tip API to learn of new submissions. The response must be of constant size and it must be authenticated and end-to-end encrypted. This software could then either expose a UI of it's own to notify the user of a new submission or drop a new email in the users local maildir that would then allow them to read the notification from their email client.

The fact that we have a widget capable of generating cover traffic makes our system much more similar to a mix network since we have N inputs to which correspond M outputs where M << N.

I will take a deeper look at that paper and then perhaps ask this in some mailing list.

@tomrittervg do you have a suggestion for the ideal mailing to send this message to?

hellais commented 10 years ago

This was a discussion happened on IRC on the topic:

13:29 < hellais> armadev, nickm: do you have any advice on this https://github.com/globaleaks/GlobaLeaks/issues/264
13:31 < hellais> the last comment explains what is the kind of threat we are interested in dealing with
                 (https://github.com/globaleaks/GlobaLeaks/issues/264#issuecomment-22932337)
13:47 < armadev> "no no, i was leaking that other thing" is probably not a defense you want
13:48 < armadev> s/was/might have been/
13:50 < armadev> it seems it really comes down to how many other people in the company are using tor for other things
13:51 < hellais> armadev: well not necessarily, because of two reason: 1) The WB may be submitting over Tor and it's not necessary that they are the only Tor user on the office
                 network 2) The WB may be using the Tor2web node and the cover traffic widget will have been loaded by some other people inside the company
13:51 < armadev> whatever your cover traffic model is, it seems clear that the wb should submit her traffic the same way
13:51 < hellais> see https://github.com/globaleaks/GLClient/blob/master/app/decoy.html#L80 and https://github.com/globaleaks/GlobaLeaks/issues/534
13:52 -!- phillw [~phillw@host-92-13-126-214.as43234.net] has quit [Ping timeout: 480 seconds]
13:52 < armadev> e.g. if the cover traffic people connect once an hour, the wb could run it too, and queue her traffic to go out in place of one of the covers
13:53 < hellais> armadev: the fact is that the cover traffic widget is something that only generates traffic when a user visits a website that is hosting it, so it's not
                 anticipatable when such thing will happen
13:53 < armadev> hum.
13:53 < armadev> that's not ideal
13:53 < hellais> armadev: the problem I am trying to solve is to simply improve the current situation that is trivially attackable by timing correlation
13:54 < hellais> because when a submission is performed a notification email is immediately sent
13:54 < armadev> when you decide on an answer, you should tell wordpress about your answer. they have the same problem.
13:54 < armadev> (and have implemented a quite simple solution so far)
13:55 < hellais> that means that somebody monitoring notification emails (either because they are a rogue receiver or they are monitoring receiver email accounts) can just look
                 at the timestamp of the notification and pull out all the users that did suspicious traffic around that ti
13:55 < hellais> *time
13:55 < hellais> armadev: huh, wordpress? How so?
13:56 < armadev> hellais: they added a feature to delay the appearance of your blog post
13:56 < armadev> years ago. at the insistence of ethan and sami.
13:56 < hellais> armadev: but in the wordpress case it's not a mix, because you only have 1 input and 1 output.
13:56 < armadev> seems like a similar problem
13:56 < armadev> correct.
13:56 < armadev> i think your solution will not be a mix either.
13:57 < hellais> armadev: I guess I could rephrase the question differently: would a threshold or timed mix (or combination) buy us anything? If not, why not?
13:59 < armadev> hellais: you definitely don't want a threshold mix, if your adversary can fill it with messages until it pops out
13:59 < armadev> then the attack is "wait for alice to connect, then flush her [potential] message"
13:59 < armadev> for each alice that connects
14:00 < armadev> hellais: one option is a pool mix, where every cover connection is just a dummy event
14:01 < armadev> hellais: so at every time cutoff, you take a fraction of messages from the pool, and either deliver or discard them. (or a number of messages)
14:01 < armadev> with the goal of maintaining a pool of p messages
14:02 < armadev> if it were just a timed mix, then when a message gets delivered, you know that it came from one of the senders in the previous round. that could lead to
                 intersection attacks.
14:02 < armadev> http://freehaven.net/anonbib/#diaz:pet2003
14:02 < armadev> http://freehaven.net/anonbib/#SN03
14:03 < armadev> and oh hey, you could use the alpha mixing design here too. :)
14:03 < armadev> (it is likely overkill)
14:04 < armadev> but if you are giving your users a safety slider, that's what alpha-mixing is for
14:10 < armadev> i'd go for taking a fraction out of the pool,
14:10 < armadev> since most of the messages are dummies so it is hidden how many you actually took out
14:10 < armadev> (that is, a fraction such that the expected value remaining is p)
fpietrosanti commented 10 years ago

Discussion started also on cryptography mailing list: http://lists.randombit.net/pipermail/cryptography/2013-August/005052.html

fpietrosanti commented 10 years ago

I really feel that this feature should be implemented in a simple way, because i really think that no solution can fix the problem with a reasonable degree of usability.

Like for the widget, we are not fixing a problem but making it more difficult to be exploited.

So, as also suggested by @tomrittervg , the random delay with lower/upper boundary and a simple to be selected threshold would be the best solution.

We just need to be fair in communicating this feature as something that "make more difficult to make timing attacks" but not something that "prevent timing attacks" .

fpietrosanti commented 10 years ago

Also Mario Heiderich agree with the proposal of Tom Ritter.

Below excerpt from from skype chat with him: Mario Heiderich: I read up on #264 - and fully agree with Tom Ritter. I think the ticket discussion has arrived at a proper conclusion: Raising the bar to mitigate timing attacks - yet communicating that is doesn't fully disable a strong attacker to derive sensitive information via timing and traffic correlation

fpietrosanti commented 10 years ago

Added security tag. That's an open security bug from past pentest

vecna commented 9 years ago

about on it, after the #1072 the notification procedure has been completely refactored, now we've a notification status in the DB, and implement mail digest became very simple.