Prevent automated submissions (CSRF)

klpwired commented 11 years ago

EDIT: Renamed title to reflect current discussion.

bitsteak commented 11 years ago

Many existing solutions to anti-CSRF protection use cookies, like the cookbook solution provided by the web.py docs. However, existing DeadDrop users prefer not to use cookies at all to help protect the identity of sources. For example, the New Yorker states:

"To help protect your anonymity, Strongbox is only accessible using the Tor network (https://www.torproject.org). When using Strongbox, The New Yorker will not record your I.P. address or information about your browser, computer, or operating system, nor will we embed third-party content or deliver cookies to your browser."

We'll have to get creative to solve this problem.

bitsteak commented 11 years ago

Thinking about this a bit more, what we're really looking to do is prevent automated submissions. CSRF is not really a security issue.

djon3s commented 11 years ago

Just following on what @bitsteak said there: I have a feeling that automated submissions will be a problem in the long term. An easy DoS would be to send large volume of material, clogging up the resources of a news organization who have to go through a lengthy process to decrypt and access that information.

bitsteak commented 11 years ago

Agreed, this ticket is all about anti-automation and rate-limiting.

ferrouswheel commented 11 years ago

Web frameworks like Django allow you to embed csrf tokens in html templates.

Then you can either require a csrf token hidden input with html form submission... or by rendering in a javascript script block and adding it to the headers of all ajax requests.

ioerror commented 11 years ago

Automation of submission isn't too much of a problem, I think. In any case - the Tor Browser Bundle handles cookies in a reasonable fashion.

bitsteak commented 11 years ago

@ioerror I'm not sure you read the ticket. We're trying to prevent automated submissions in a way that satisfies the privacy statement given by our existing users.

ioerror commented 11 years ago

I have read the ticket, I don't think that automated submissions are that much of a problem. If they become a problem, I would sort it out on the backend. However, my understanding is that this was also a ticket about CSRF - so the second half of my comment should still apply.

Taipo commented 10 years ago

In any case - the Tor Browser Bundle handles cookies in a reasonable fashion.

The Firefox browser is not the only way an attacker can send requests to a SecureDrop. An attack tool proxied through TOR for example is another.

Secondly the SecureDrop security is still depending on a webhost keeping their webserver and linux distro up to date. These are what attackers will also take aim at via remote form post requests ( CSRF ) circumventing any XSS protection that TBB depends on via the NoScript addon in Firefox.

The New Yorker will not record your I.P. address or information about your browser, computer, or operating system, nor will we embed third-party content or deliver cookies to your browser.

As long as the whistleblower has flash disabled and the NoScripts functioning correctly, TBB should deliver a whistleblower to a SecureDrop without an Evercookie tracking them.

I am not sure of the rationale behind the New Yorkers cookie policy, the idea that a journalist whistleblower website would ever conceive of the idea of delivering a malicious cookie to a whistleblower is incomprehensible to me, and if they did they certainly wouldn't admit to it. So perhaps they have presented an argumentum ad populum because its not the having cookies enabled when you visit a SecureDrop that is the security risk, and for goodness sake its not a potential cookie that a SecureDrop server would send to your TBB browser ( if session cookies were used for example ) thats the security risk ( also considering cookies are deleted when you close TOR ), rather its having a combination of javascript and flash enabled when you visit an NSA run Evercookie inserting drop site that is the potential 'Cookie' risk.

diracdeltas commented 10 years ago

Can someone explain why we're trying to avoid using cookies (besides the fact that for New Yorker in particular, it's against the current privacy policy)?

One way to get around this would be to use the hash of the source's codename as a CSRF token, since we store that anyway on the server.

fpietrosanti commented 10 years ago

Automated flood of submission are a PITA, some high traffic adopter of GlobaLeaks went subject of a DOS attempt with many submission that are very difficult to be managed. This created just an annoying storm of notification to the many journalists acting as a receivers.

GlobaLeaks have Anti-XSRF in place implemented as described here https://docs.google.com/a/apps.globaleaks.org/document/d/1SMSiAry7x5XY9nY8GAejJD75NWg7bp7M1PwXSiwy62U/pub#h.ft361cd0nhl4 but this does not prevented from receiving tons of submission.

At globaleaks we are going to introduce Captcha and some threshold/timing stuff to make more difficult to make this kind of DOS https://github.com/globaleaks/GlobaLeaks/issues/297

diracdeltas commented 10 years ago

@fpietrosanti Thanks for all the input! It sounds like we have a lot to learn from GlobalLeaks.

I checked, and we do sort-of have anti-CSRF protection because the source's secret codename is a hidden input field in the form where they submit documents. I don't know if the server rejects POSTs either without codenames or with codenames that don't yet exist. (If not, it should. Though I think hashing the codename first and only storing that would be better.)

But you're right that this doesn't prevent (D)DOS attacks.

fpietrosanti commented 10 years ago

@diracdeltas About the "flood" i think that it's worth to try to brainstorm some kind of "common specification" to mitigate the impact of sending of tons of submission, both from security (like having captchas and some crypto/proof-of-work), technical (like exponential-delay-threshold statistical based methods), workflow (like aggregating notifications) and human point of view (like also introducing human moderation in presence of floods). Because not having the availability of "IP address" as a parameter, introduce a lot of complexity in handling that stuff that's still an open issue!

Taipo commented 10 years ago

I checked, and we do sort-of have anti-CSRF protection because the source's secret codename is a hidden input field in the form where they submit documents.

Your average packet capture would grab a list of all the POST fields including hidden ones and using that template an attacker can set their tools or remote forms to auto fill in the fields with fixed data for mandatory fields and random data for non-random fields meeting the criteria as any normal browser would. So unless a specific field value ( i.e. the source's secret codename ) is randomly generated and also expected by the server as is the case with tokens, its not affective to combat forged remote form submissions.

Most non-Browser attack tools do not understand cookies or if they do they will return an expired cookie when they shouldn't, and, also in most cases do not understand javascript questions posed to them...whereas a forged form submission could be sent from a browser not as a flood but as a single submission making it difficult to discern from amongst legitimate form submissions.

There are common solutions for both, but the thing here is I think there is a belief that this all can be done without requiring javascript and cookies to be enabled on the TOR browser. Once were over that hurdle, then were back to the suggestion above of Ferrouswheel and Bitsteak

...then you can either require a csrf token hidden input with html form submission... or by rendering in a javascript script block and adding it to the headers of all ajax requests.

Or both...

So as I see it, its either csrf tokens or CAPTCHAs.

diracdeltas commented 10 years ago

@Taipo: The codename is both randomly generated and expected by the server. (if not, it's a bug). It's also used as the unique identifier for sources. I might be misunderstanding you.

Does anyone see a reason not to just use the codenames as both unique source identifiers and CSRF tokens? I suppose it would be better for CSRF tokens to be session-based instead of persistent.

Taipo commented 10 years ago

No I think you got it right, I do not have SecureDrop set up here yet so was making a number of assumptions. It sounds to me like you have solved the POST flood and remote forged form problem, it just needs testing I suppose. Once I get things configured here I can start attempting to take it apart rather than merely postulating theories in my head.

diracdeltas commented 10 years ago

@Taipo: Cool! For testing the source and journalist interfaces, all you need to do is run the instructions at https://github.com/freedomofpress/securedrop/blob/master/modules/deaddrop/files/deaddrop/HACKING.md to set up a Python virtualenv, then "python {source,journalist}.py" and go to localhost:8080. It's probably not at all obvious that this is the procedure. :)

fpietrosanti commented 10 years ago

@diracdeltas Is the securedrop providing a Javascript-enabled interface or it's pure-HTML? Depending on this you may have an Anti-XSRF feature designed in a way or another one. https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)_Prevention_Cheat_Sheet I'd suggest just to use an existing web framework or copy "as-is" an existing secure CSRF implementation, trying to avoid to reinvent the wheel possibly introducing bugs

diracdeltas commented 10 years ago

I take back what I said earlier; the source server doesn't check for valid codenames. So we don't actually have any anti-CSRF at the moment.

garrettr commented 10 years ago

Can someone explain why we're trying to avoid using cookies (besides the fact that for New Yorker in particular, it's against the current privacy policy)?

@diracdeltas There is no specific technical reason not to use cookies. They are somewhat verboten because they are often used for tracking and analytics purposes. I think there have been some "leak sites" in the past that have promised anonymity for sources, but then have been (rightfully) mocked for using Google Analytics or similar services, which of course leaks their visitor's identities to a 3rd party.

There may also be a concern because browsers save cookies, and they can be used to identify what sites a user (like a potential source) has visited if you have access to their computer. This is the same concern presented by browser history, and can be mitigated in the same ways (private browsing, clear history/cookies, or use TBB which does not persist history or cookies).

Does anyone see a reason not to just use the codenames as both unique source identifiers and CSRF tokens?

At the moment, we do not need CSRF mitigation because we do not use cookies to authenticate users. Instead, we "hand off" the user's codename from page to page, which serves as their authentication token. An attacker trying to perform actions on a user's behalf would have to guess this value, and since it is a Diceware passphrase as long as it is >= 7 words this should be too "hard".

This is not a perfect solution. We can only do this because the site is so simple and has a limited tree of possible interactions the user can take. If we want to make the site more complex, this may become unwieldy and a potential source of bugs. In that case, we might want to use cookies for authentication, in which case we should implement standard CSRF protections (limit cookie lifetime, use synchronizer tokens). The link shared by @fpietrosanti above is a great resource.

Additionally, the entropy (and the resistance to an impersonation attack) is only as good as the codename, and we recently made changes that allow users to pick shorter (down to 4 words) code names. These are trivial to guess, especially we currently don't implement any kind of request rate limiting or automated submission protection.

I take back what I said earlier; the source server doesn't check for valid codenames. So we don't actually have any anti-CSRF at the moment.

The source server does check for valid codenames - see store_endpoint in source.py. If someone tries to submit a document/message with an invalid (not previously generated by the web application) codename, the processing will abort and they will get a 404.

garrettr commented 10 years ago

As I explained above (and as @bitsteak noticed a while back), CSRF is not an issue given the current architecture. This bug should be focused on preventing DoS through automated submissions, and I've changed the issue title to reflect this.

This bug has also gotten quite unwieldy. I would encourage people who want to work on this to pick a specific anti-DoS mechanism and open a separate issue to track its development.

Taipo commented 10 years ago

FYI

POST 127.0.0.1:8081/generate/ HTTP/1.1
Host: 127.0.0.1:8081
Accept: */*
Content-Type: application/x-www-form-urlencoded
Content-Length: 19

number-words=999999

This causes a little bit of CPU bother especially when jackhammered at a server. Needs to be tested on a server in the wild though to see just how much resources could be used up via TOR which is a lot more restrictive than hammering this locally. Merely restricting this using form variable limits is not enough.

Taipo commented 10 years ago

There are a number of DoS attacks ranging from attacks against the HSDir system which are outside of what can be achieved with SD ( see http://donncha.is/2013/05/trawling-tor-hidden-services/ for one such example ) through to direct denial of service attacks on the source server or even the network connection of the journalists inhouse network.

More relevantly to this issue are DoS attacks that attempt to overwhelm the CPU ( see above for simple example ), fill the HD space or available bandwidth of these journalism housed webservers.

Then there are just generally annoying spam type attacks with the aim of filling a server with many thousands of fake source submits. But by far the most risk faced by a source is a brute force attack on the lookup feature attempting to guess a source codename.

I do not see any problems with using CSRFToken methods as a means of preventing automated POST form submissions. It has its limits but it does not require a source to change the default configuration of the TBB and would mitigate most of these types of DoS attacks.

Keeping it simple leaves a heavily resourced adversary very little room to move.

If an average adversary were presented with such a simple form submission that had no field validation issues and effectives means to flood a servers resources, they would eventually move on to the file upload feature and look at what damage one could inflict via say for example, tricking a journalist into opening a malicious file.

A well resourced adversary on the other hand might have bigger fish to fry.

Because of the structure of SecureDrop, there will no doubt be a tendency for media companies to house these servers in-house thus weakening one part of the whole Hidden Service aspect of the TOR HS where the physical location of the server is unknown to the adversary.

This can then open them up to the potential of direct denial of service attacks on their internet connections should an adversary choose to do so, and even DNS type MITM attacks and all the **other types of attacks that can be leveled at a hidden service when the physical location of the server is known to the adversary.

\ List of attacks where the location of the server is known: VAGRANT: Collection of Computer Screens MAGNETIC: Sensor Collection of Magnetic Emanations MINERALIZE: Collection from LAN Impant OCEAN: Optical Collection System for Raster-Based Computer Screens LIFESAVER: Imaging of the Hard Drive GENIE: Multi-stage operation; jumping the airgap etc BLACKHEART: Collection from a implanted spook DEWSWEEPER: USB hardware host tap that provides COVERT link over USB link into a target network. RADON: Bi-directional host tap that can inject Ethernet packets onto the same target. ( source: https://www.documentcloud.org/documents/807030-ambassade.html#document/p1 )

garrettr commented 10 years ago

number-words=999999

@Taipo Fixed with 05f52ec89b

Taipo commented 10 years ago

Thanks for that Garrett

Something else I was thinking about last night. When an attacker attempts to brute force the /lookup/ they receive a 404 Not Found on incorrect id values.

POST /lookup/ HTTP/1.1
Host: localhost:8081
Accept: */*
Content-Type: application/x-www-form-urlencoded
Content-Length: 17
id=this+is+a+test

HTTP/1.1 404 Not Found
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: -1
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Date: Wed, 30 Oct 2013 23:25:59 GMT
Server: localhost

Which I presume is generated from:

def store_endpoint(i):
  sid = crypto.shash(i.id)
  loc = store.path(sid)
  if not os.path.exists(loc): raise web.notfound()

I am wondering if this is the best method of dealing with an incorrect lookup requests. Brute force attack depends on having the ability to determine whether or not a successful request has been made. Returning a constant 200 Ok irregardless of the success or failure of a /lookup/ request will make their job a lot harder.

heartsucker commented 10 years ago

Working on this issue. Creating a Redis store the tracks counts of web events for past 30 minutes and looks for spikes in activity. Spikes will trigger an email for now and later will turn on captchas on the site.

dolanjs commented 10 years ago

Instead of it sending an email can we feed the event it generates into a log file monitored by OSSEC

diracdeltas commented 9 years ago

Can we close this? Flask has CSRF protection included.

garrettr commented 9 years ago

There are a lot of interesting ideas and discussion in this issue; however, it is unfocused and the goal is unclear. Here's an update on how things stand in the current version of SecureDrop (0.3.1):

We use cookies for session management, because it makes development much simpler. The cookies are signed, and are used in conjunction with hidden form fields to implement CSRF protection (via flask-wtf).
DoS/automated submissions are not really related topics. We will continue to track DoS mitigation strategies in other issues.

Therefore, this issue is no longer relevant and it can be safely closed.

freedomofpress / securedrop

Prevent automated submissions (CSRF) #4