Insecure masking key selection?

crossbario / autobahn-python

WebSocket and WAMP in Python for Twisted and asyncio

https://crossbar.io/autobahn

MIT License

2.48k stars 770 forks source link

Insecure masking key selection? #758

Closed njsmith closed 7 years ago

njsmith commented 8 years ago

RFC 6455 says:

The masking key is a 32-bit value chosen at random by the client. When preparing a masked frame, the client MUST pick a fresh masking key from the set of allowed 32-bit values. The masking key needs to be unpredictable; thus, the masking key MUST be derived from a strong source of entropy, and the masking key for a given frame MUST NOT make it simple for a server/proxy to predict the masking key for a subsequent frame. The unpredictability of the masking key is essential to prevent authors of malicious applications from selecting the bytes that appear on the wire.

But if I'm reading the code right, autobahn appears to select masks using random.getrandbits, which uses Python's default Mersenne Twister-based RNG. The Mersenne Twister is inappropriate for security-sensitive applications because its future outputs can be predicted by observing previous outputs. (I.e., seeing the first N masks lets you predict the future masks.)

meejah commented 8 years ago

Yes, that should almost certainly not be using the random module. It should use os.urandom(4) instead, probably.

That said, a 4-byte (32-bit) random value is pretty small -- but I'm not familiar enough with 6455 to know how this really fits in to the security etc. In any case, we should by default use os.urandom for all entropy unless there's a good reason not to, and it's definitely not security-related entropy.

(/cc @oberstet)

oberstet commented 8 years ago

No, the code is fine as it is - there is no security issue here.

There is a misunderstanding of the purpose of client-to-server WebSocket masking here. It was introduced into the spec for this specific scenario: JavaScript running in the browser coming from an untrusted source, attacking an intermediary which does not understand WebSocket by sending deliberately formed byte sequences over the wire.

For JS running on NodeJS, this masking thing is snake oil: one can send arbitrary byte streams over TCP using that environment anyway. Of course the same with any language that allows direct and full TCP socket access.

So key to understanding the spec is understanding the attacker and threat: "authors of malicious applications". There is no server-to-client masking (because servers "are trusted"), there is client-to-server masking because we have browsers which implementors are trusted, but which run potentially malicious code from 3rd parties. You cannot stop some C coded client app sending you (or some intermediary) arbitrary bytes. Same for Python. Or put differently, Python does not expose a scripting sub-environment that would run code less trusted than the Python code in the first place (which can send arbitrary bytes), and which would need shielding and additional containment measures.

Consequently, AutobahnPython even provides a switch to turn off client-to-server masking (for clients) and to turn off the checking for client-to-server masking (for servers), as this saves some CPU cycles.

In fact, we recommend for production:

always use TLS (for security and non-interference from network intermedaries)
use above switches and save some CPU cycles (when using implementations that support that)

meejah commented 8 years ago

Just for my own understanding of the RFC here: the scenario is that a "well-implemented" browser-side library is used as the in-browser WebSocket library, but some 3rd-party JS code in a page sends a message with maliciously-crafted bytes attempting (e.g.) a cross-protocol attack. So the masking is intended to prevent this malicious JS from being able to predict the exact on-the-wire bytes?

(And to expand on @oberstet's point: if you're using the Python library to do the client-side you could e.g. just monkey-patch os.urandom out anyway and 100% predict the masking -- or just open a socket to the server and send whatever bytes you like, without even using AutobahnPython).

oberstet commented 8 years ago

Just for my own understanding ... to predict the exact on-the-wire bytes?

Exactly! This was the original attack scenario the WG had in mind. Whether the spec does a good job of transporting that is a different question;)

Yeah, the monkey patching: this is the final practical nail with trying to enforce the bytes on the wire in a Python library used within a Python app.

oberstet commented 8 years ago

Probably also interesting from this perspective: the "origin" field in the WebSocket opening handshake cannot be forged from browser JS, but again becomes snake oil for all language environments able to implement WebSocket.

njsmith commented 8 years ago

I'm aware that untrusted Javascript + untrusted server + trusted browser is the main threat model that the masking is designed to mitigate, but it wasn't obvious to me from the RFC whether that's the only situation where it's important.

For example, this architecture would obviously be insecure with autobahn's use of MT:

Untrusted javascript
         |
Trusted browser --- [Autobahn-based Websocket proxy] --- dodgy middleware --- untrusted server

...but that's a pretty weird topology, not sure why anyone would implement that.

Or there could be attacks involving a malicious server tricking an otherwise well-behaved client into sending some data that's under the server's control? I guess the simple version of this is: you should never run an autobahn-based echo client against an untrusted server, because that might allow that server to hijack unrelated browser sessions. Though again, I'm not sure why anyone would be running an echo client against random servers. But maybe it's more plausible that some websocket-based protocol would involve the client at some point sending data that originated on the server...

(I'm similarly uncertain on whether secure entropy is needed for the initial handshake nonce -- AFAICT it isn't? But autobahn actually does use os.urandom for this :-).)

njsmith commented 8 years ago

maybe it's more plausible that some websocket-based protocol would involve the client at some point sending data that originated on the server...

[Edit: on further thought, maybe I shouldn't post what AFAICT might be a practical attack here in public...]

For what it's worth, it looks like urandom can probably be made pretty fast. If pulling 4 bytes at a time, it's substantially slower than random.getrandbits (cpython 3.5.2, Linux 4.8.0, x86-64):

~$ pyperf timeit -s 'from os import urandom' 'urandom(4)'
.....................
Median +- std dev: 647 ns +- 6 ns
~$ pyperf timeit -s 'from random import getrandbits' 'getrandbits(32)'
.....................
Median +- std dev: 108 ns +- 1 ns

But if you fetch multiple masking keys at once, then the amortized cost goes way down:

~$ pyperf timeit -s 'from os import urandom' 'urandom(400)'           
.....................
Median +- std dev: 2.52 us +- 0.01 us

That's 100 keys in 2.52 us = 1 key in 25 ns (amortized), which is 4x faster than pulling a key at a time out of getrandbits. Of course there's still some overhead I'm not counting for keeping track of the keys in the batch, and a similar batching trick could probably be used for getrandbits, but my point is that it seems plausible that autobahn could switch to os.urandom without losing any performance compared to where it currently is. (If mask fetching is even a bottleneck to start with.)

njsmith commented 8 years ago

...is there a security@autobahn.ws or similar kind of address where I should re-post the part of the above post that I just edited out? I suppose a bunch of people just got it in their notifications :-(. Hopefully I'm missing something that makes it impractical, and presumably there aren't that many people using autobahn-based clients without encryption through dodgy middleware, but my current best guess is that it is probably exploitable under at least some realistic circumstances.

oberstet commented 8 years ago

As explained, there is no security issue here. But thanks for looking into. Always good.

njsmith commented 8 years ago

Did you even read what I wrote? It's certainly possible that you are correct and there's no security issue here, but you definitely haven't explained why. I just described how to carry out a variant of the cache poisoning attack that works even if the client is only running trusted code. Your response I guess is "don't worry, the client only runs trusted code". Not very reassuring, you know? . I guess if I were a real security researcher then this is where I would put in some bluster about how if I didn't get a real response by X date then I'll make an inflammatory blog post or something, but honestly I'd rather skip that part and just have a conversation...

On Nov 28, 2016 01:48, "Tobias Oberstein" notifications@github.com wrote:

As explained, there is no security issue here. But thanks for looking into. Always good.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crossbario/autobahn-python/issues/758#issuecomment-263227245, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlOaHxzwYYQrUmkO8HpGYywOAL1w8Swks5rCqNmgaJpZM4K9CUU .

oberstet commented 8 years ago

Your scenario does not work, because the JS does not have the ability to get at the seed presumably used with the PRNG in the AutobahnPython based proxy or wiretap the masked bitstream coming out from the proxy (and hence would be able to compute the seed, and then predict further bytes). Note that the seed is different for each WebSocket connection (and is indeed fed by a true entropy source (urandom)).

njsmith commented 8 years ago

That's not true -- the JS can collude with the server to break the masking. If your argument was correct then browsers wouldn't need to worry about secure masking either, because browsers also don't expose the masking key to JS. . However, that's not the scenario that I'm worried about. In the post after the one you're looking at, I also gave an example of how a malicious server can (AFAICT) easily and straightforwardly attack an autobahn-python client that connects to it, without any JavaScript being involved at all. I had second thoughts about posting this publicly so it's no longer visible in the GitHub issue above, but if you check your email notifications it should be there. (It's the same email that has "pyperf timeit" results.) I'm of course also happy to repost here or in private if you want -- just say the word.

oberstet commented 8 years ago

If you think there is an issue with your other scenario, please post it here, so all can read it ...

Am 28.11.2016 6:03 nachm. schrieb "Nathaniel J. Smith" notifications@github.com: That's not true -- the JS can collude with the server to break the masking. If your argument was correct then browsers wouldn't need to worry about secure masking either, because browsers also don't expose the masking key to JS. . However, that's not the scenario that I'm worried about. In the post after the one you're looking at, I also gave an example of how a malicious server can (AFAICT) easily and straightforwardly attack an autobahn-python client that connects to it, without any JavaScript being involved at all. I had second thoughts about posting this publicly so it's no longer visible in the GitHub issue above, but if you check your email notifications it should be there. (It's the same email that has "pyperf timeit" results.) I'm of course also happy to repost here or in private if you want -- just say the word.

On Nov 28, 2016 3:49 AM, "Tobias Oberstein" notifications@github.com wrote:

Your scenario does not work, because the JS does not have the ability to get at the seed presumably used with the PRNG in the AutobahnPython based proxy or wiretap the masked bitstream coming out from the proxy (and hence would be able to induce the seed, and then predict further bytes). Note that the seed is different for each WebSocket connection (and is indeed fed by a true entropy source (urandom)).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/crossbario/autobahn-python/issues/758#issuecomment-263251892, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlOaGRkRHaST7HahYOLr8fHJ0tbJlk_ks5rCr_cgaJpZM4K9CUU .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/crossbario/autobahn-python/issues/758#issuecomment-263328927, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAOPfONduCmlSFQIYXk8lJhUIcIC1nmLks5rCwl7gaJpZM4K9CUU.

njsmith commented 8 years ago

OK, if that's how you prefer to handle it. . The attack is: . 1) I use a regular vanilla autobahn-python client to connect to ws:// evil.com/evil (unencrypted), through a dodgy intermediary (the kind that the RFC is worried about, that is prone to cache poisoning attacks). . 2) evil.com observes the first few masking keys that my client sends, and uses them to infer the state of my RNG. (This is a well understood problem: see e.g. https://github.com/fx5/not_random . And it looks like getrandbits actually exposes the MT state very directly, making this step particularly easy.) . 3) evil.com predicts the next masking key that my client will use, and sends a malicious PING request. The payload of the PING is one of the pseudo-HTTP-requests described in the [TALKING] paper cited by RFC 6455, pre-masked with my next masking key. For example: "GET / HTTP/1.1\r\nHost: facebook.com\r\n\r\n". This is 38 bytes, so it fits easily into a PING payload. . 4) My client dutifully takes the PING payload, masks it -- this restoring it to cleartext -- and sends it back as a PONG packet. . 5) The server sends back a version of http://facebook.com that's been altered to contain some arbitrary attack code. . 6) The dodgy intermediary observes the exchange in steps (4) and (5), and saves the altered version of http://facebook.com into its cache. . 7) I or someone else behind the intermediary attempt to use our web browser to visit http://facebook.com, and get the cached version delivered. Now evil.com has arbitrary control over my Facebook account. . (Hopefully facebook actually uses HSTS, which should somewhat mitigate this attack. But that's hardly a general solution; not every site uses HSTS, and it doesn't help for first visits anyway.) . AFAICT this means that autobahn-python clients with predictable RNG outputs are exactly as vulnerable to cache poisoning attacks as a browser with a predictable RNG output would be. The scenario requires a few somewhat unusual things -- use of unencrypted websockets, a malicious server, and a dodgy intermediary -- but these are exactly the same requirements as the browser version of the attack. The key insight is that the only thing JS really gives an attacker is that it lets the server control the client's outgoing plaintext, and websocket PING frames give the same capability without JS. . The mitigation is simple and easy: just use os.urandom to choose masking keys, at least over unencrypted connections. This doesn't need to be particularly expensive.

oberstet commented 8 years ago

The key insight is that the only thing JS really gives an attacker is that it lets the server control the client's outgoing plaintext, and websocket PING frames give the same capability without JS.

Even with the original JS scenario, and without any masking, one cannot get complete control over the outgoing bytes (these are not plaintext, as there is no encryption involved with masking).

There are two reasons for this:

WebSocket frame/message headers cannot be controlled (so there will be bytes sprinkled in between that cannot be chosen arbitrarily)
the HTML5 API for WebSocket does only expose a message based API (eg AutobahnPython also has a frame- and a streaming API)

With a WebSocket client that exposes a streaming API, a per-frame mask is useless, since the malicious server can observe the mask from the beginning of the frame and control the client to send something while the first frame is still being sent out.

Using WebSocket ping/pongs for making an arbitrary WebSocket client send specific payload is nevertheless an interesting idea!

However, I still don't think this is practical:

it is easy to trick a browser into opening an arbitrary, untrusted Web site, load untrusted JS from there - however, a Python client is trusted (by definition, as it can send arbitrary bytes anyway), and how do you trick the trusted Python client to connect to an untrusted server remotely?
one needs to observe 624 WebSocket frames first to derive the internal state of MT19937
because of WebSocket frame/message headers, one cannot control the bytes on wire by controlling the payload completely (GET / HTTP/1.1\r\nHost: facebook.com\r\n\r\n will be the payload, but there will be some bytes before these bytes)
because of the 2+3, the intermediary being attacked would need to be broken in a very specific way (don't understand WebSocket, but ignoring the UPGRADE header and not bailing out, then ignore (not bail out) on the noise from 2., then somehow ignoring the initial bytes from the WebSocket header of the forged message from 3., but then again process the payload and (falsely) following the HTTP request contained therein

The mitigation is simple and easy: just use os.urandom

No, it is too slow and drains entropy for important stuff.

But I guess we won't agree on this one. I'd be ready to add two more optional masking modes though:

A) Generate frame masks from os.urandom
B) Set a fixed mask of all 0 in clients and skip XOR unmasking code when mask is all 0 (saving CPU cycles and without requiring entropy)

We can use B) when running over TLS and probably make it default. You can use A) if you believe it is necessary;)

meejah commented 8 years ago

There's a lot of discussion about "draining entopy" from urandom, but the conclusion from cryptographers (that I trust) seems to be: don't worry about that. (Of course, speed is a different thing).

From the paper @njsmith mentions, there are some proxies deployed on the internet that fail in that specific way (i.e. they don't understand the framing enough to know that the ping/pong payload bytes aren't a request). Not very many, but more than zero...so, an option to use os.urandom sounds like a decent compromise (if you're deploying a Python client in some weird or very-adversarial network topology).

I think "B" above (fixed all-0's mask) might be non-spec-compliant? (5.1 says "a client MUST mask all frames that it sends to the server").

So the current situation is that we're using urandom-seeded Mersenne Twister, right? Instead, we could just (always) use a urandom-seeded block-cipher as a (secure) PRNG. Do we unconditionally depend on PyNaCl? It has a random_bytes function with a salsa20 implementation (I'm not sure how you switch whether it's using the "systemrandom" or "salsa20" implementation from the python bindings though).

oberstet commented 8 years ago

So the current situation is that we're using urandom-seeded Mersenne Twister, right?

I'd expect MT to be seeded from real entropy, yes. We use the global one (from stdlib).

I think "B" above (fixed all-0's mask) might be non-spec-compliant?

From the letter of the spec, yes. From the intention of the spec, no. Practically (and that means, running over TLS), it just burns CPU additional cycles without gain.

"no" because the bytes on the wire for a TLS connection cannot be selected from app code using a TLS implementation, and presumably broken intermediaries hence cannot be deliberately spooked.

MITM proxies which can unwrap TLS, but then either are broken themselfes (dont understand WS) or send bytes downstream over a non-TLS and broken intermediary: yes, right. I'm out;)

Note that AB (deliberately) deviates from the spec already: we fail by hard dropping a connection upon protocol violation without performing a closing handshake by default (you can override that). Trying to continue talking to a broken client increases the number of things that can go wrong down the line.

Sidenote: the IETF WG spent many many hours fighting about masking - and the HTTP/2 WG again (they don't have any masking btw!). With "this is snake oil" up to "it's not enough: we need to reinvent full encryption" opinions. This is the "rough consensus" part of the IETF at work;)

njsmith commented 8 years ago

However, I still don't think this is practical:

To be clear, all I'm claiming is that it's ~equally practical as the corresponding attack on browsers, so that if browsers need to use a secure RNG (i.e., if we believe that admonition in RFC 6455 at the top of this thread), then autobahn needs to as well.

it is easy to trick a browser into opening an arbitrary, untrusted Web site, load untrusted JS from there - however, a Python client is trusted (by definition, as it can send arbitrary bytes anyway), and how do you trick the trusted Python client to connect to an untrusted server remotely?

I don't find this very reassuring. If I use requests to fetch a page from a remote server, then I don't generally expect that I'm letting that remote server take control over other random pages on other domains accessed by other programs. Similarly, if I use autobahn to talk to a remote server, that doesn't mean I trust that remote server with my credit card...

one needs to observe 624 WebSocket frames first to derive the internal state of MT19937

Yes. This doesn't seem like a problem in practice, and is the same for both browsers and autobahn.

because of WebSocket frame/message headers, one cannot control the bytes on wire by controlling the payload completely

Yes. This is the same for both browsers and autobahn.

because of the 2+3, the intermediary being attacked would need to be broken in a very specific way

Yes. This is the same for both browsers and autobahn.

The mitigation is simple and easy: just use os.urandom

No, it is too slow and drains entropy for important stuff.

As @meejah noted, "draining entropy" is not a real thing. As for speed -- have you measured it?

Current code in autobahn:

$ pyperf timeit -s 'import struct, random' 'struct.pack("!I", random.getrandbits(32))'                       
.....................
Median +- std dev: 282 ns +- 6 ns

Code from this gist:

$ pyperf timeit -s "from quickurandom import masking_keys; mk = masking_keys()" "next(mk)"        
.....................
Median +- std dev: 247 ns +- 6 ns

-> urandom can easily be made ~12% faster than what you're doing now. If it even matters. Is picking the masking key really a bottleneck as compared to, say, XORing the mask, or like, doing anything at all in Python? :-) Right now you're spending 2/3 of the time not in picking the random masking key, but just in Python data structure manipulation (getrandbits alone runs in ~110 ns).

We can use B) when running over TLS and probably make it default. You can use A) if you believe it is necessary;)

I'm not bothered by choosing (B) over TLS as a default, though I guess it might make sense to double-check with Real Security Researchers :-). Presumably the WG had some reason that they didn't make this the default in the first place, and it might be useful to double-check what that was just in case.

But over non-TLS connections, then I think urandom really should be the default. As shown above, it doesn't have to be any slower, and "secure" is a much better default than "insecure" -- the people who should flip the "turn on security" switch generally are exactly the ones who don't realize that they should. Let the people who need to eke out every last bit of speed and are able to research the trade-offs be the ones who get to making active decision to flip the switch. And people generally expect a websocket implementation to have the security properties described in RFC 6455.

I'd expect MT to be seeded from real entropy, yes.

Yes, Python's MT is seeded from urandom (unless someone explicitly sets the seed).

njsmith commented 8 years ago

Ah, this seems to be the start of a thread where it was discussed whether masking should apply to wss://.

An interesting thing I learned from the discussion: it's actually not true that TLS acts as masking (!). Depending on what TLS cipher suite is in use, a malicious server that can control the client's plaintext can control the bytes the client sends on the wire. This isn't too surprising once you realize that of course the server knows what encryption key the client is using -- TLS is designed to protect against third-party eavesdroppers.

oberstet commented 8 years ago

@njsmith I am reopening this, because you've spent significant time on it, and I don't have time right now to continue the discussion (I am not convinced) - but I don't want to leave an impression I'd be brushing this off.

Eg rgd your speed test: something is wrong here .. real entropy comes from syscall, and at 250ns, that would mean 4 million/sec - unlikely. My guess there might be some caching/batching in userland .. either Python stdlib or glibc? Further: how does it scale with multiple processes / threads calling into os.urandom? This is important for us too (well, in Crossbar.io)

njsmith commented 8 years ago

rgd your speed test: something is wrong here .. real entropy comes from syscall, and at 250ns, that would mean 4 million/sec - unlikely. My guess there might be some caching/batching in userland .. either Python stdlib or glibc

The quickrandom.masking_keys generator that I linked to and gave benchmarks for is a 5 line function that does explicit batching. This adds a small amount of memory overhead for the buffer (400 bytes as implemented there, plus a bit more for the generator stack frame etc.), which is per client or per thread if you want to avoid locking. The batching only gives a speedup of ~2.5x, though... urandom is just not that slow. But, I mean, I'm just reporting what happens on my laptop, if you want to check for yourself you can always copy/paste my code and try it :-).

Also worth noting that the numbers there are with CPython 3.5, and there was a significant rework of Linux os.urandom in 3.5 (switching to using getrandom), so it's possible the numbers will look different on 2.7.

how does it scale with multiple processes / threads calling into os.urandom?

Well, this is one of those questions where the only possible way to answer is to try running the workload you care about on the full system you care about and see what happens :-). In modern kernels random.c maintains multiple internal CPRNGs so that multiple CPUs can call into urandom simultaneously without contending, so in theory you "ought" to be fine. For whatever that's worth.

because you've spent significant time on it, and I don't have time right now to continue the discussion (I am not convinced) - but I don't want to leave an impression I'd be brushing this off

Don't worry about me :-). It's your project and you should obviously make whatever decision you feel is best for your users. I started looking into this because I was poking at wsproto and trying to figure out what the "right thing" even was; if nothing else this conversation has definitely helped clarify for me what the actual trade-offs are.

Like I said, I personally would not choose to violate the RFC and potentially put some small but unknown number of users at risk in order to get a small and potentially irrelevant speedup. I have no idea how many broken intercepting proxies are out there on the network, or even if any exist. The big browser vendors seem to think it's a risk worth worrying about, and they probably have as much data as is possible for anyone to get (even though I'm sure they wish they had more too!), so I'm inclined to follow their lead. But, I mean, you're the one who users will complain to if things are too slow, and you're the one who'll have to explain to them why you decided to go out of your way to violate the RFC if they do get pwned :-). And either way, at this point I think the thread has mostly served its purpose of making clear what the actual risks/trade-offs are (at least as well as we can know them), so now it's just a judgment call.

oberstet commented 7 years ago

won't change