Our StreamKeys are insecure

dnwiebe commented 1 year ago

Part of the infrastructure of the MASQ network is a series of virtual data streams that run from browser to server and server to browser. Each of these virtual streams consists of several real TCP streams: from the browser to the originating Node, from the originating Node to the first relay Node, several more TCP streams through the relay Nodes, from the last relay Node to the exit Node, and from the exit Node to the server--and back again, since the virtual streams are bidirectional.

In order to tell these virtual streams apart, we have something called a StreamKey that is generated and assigned to each stream when it is created in response to the browser opening a new TCP stream to the originating Node. The originating Node has a collection of virtual streams keyed by these StreamKeys representing all the streams its browser has open to servers around the Internet, and the exit Node has a collection of virtual streams also keyed by StreamKeys that go from every server it's connected to back to the originating Nodes that have requested data from those servers.

When a browser opens a new TCP stream to what it thinks is a server but is actually an originating Node, the originating Node creates a new StreamKey for the virtual stream it's about to bring into existence. To create the StreamKey, it hashes together (using SHA-1) two pieces of information: its own public key, and the SocketAddr of the browser's end of the new TCP stream.

For example, if the originating Node's public key was gBviQbjOS3e5ReFQCvIhUM3i02d1zPleo1iXg_EN6zQ, and the browser opened a stream from 127.0.0.1:51436, then the StreamKey for that stream would be 67728a4ec9205ce9cc88f22ade0c93498e425ccc.

But there's a problem.

The exit Node is not supposed to know anything about the identity of the originating Node. It can't directly derive anything about the originating Node's identity from the StreamKey it created, but it does have an important indirect source of information: its neighborhood database. That, combined with the fact that the browser's IP address is always 127.0.0.1 and the port number is always an ephemeral port between 49152 and 65535, makes for an attack vector.

If the exit Node wants to know which originating Nodes it's serving, it can do this:

fn find_public_key_from_stream_key (stream_key: StreamKey) -> Option<(PublicKey, u16)> {
    for node_record in neighborhood_database {
        for ephemeral_port in 49152u16..65535u16 {
            let candidate_key = StreamKey::new (node_record.public_key(), SocketAddr::new(localhost(), ephemeral_port))
            if (candidate_key == stream_key) {
                return Some((node_record.public_key(), ephemeral_port))
            }
        }
    }
    None // originating Node isn't in neighborhood_database
}

The malefactors who create the evil exit Node don't even have to write StreamKey::new(): it's already part of the codebase!

Assignment: Find a way to make StreamKeys more secure without disrupting existing data or Gossip protocols between Nodes.

Observation: Since nobody is ever going to try to extract from a StreamKey the data that was used to create it (as opposed to creating new StreamKeys and comparing them), it doesn't really matter what that data is, as long as it's sufficiently unique to prevent unintentional collisions. (Translation: maybe you could just pile in a bunch of random data having nothing to do with public keys or SocketAddrs. How much? Enough to assure uniqueness. How much is that? Well...think about it. If we're going to generate random data anyway, do we even need a hashing function?)

Another observation: Maintaining uniqueness inside an originating Node will be pretty easy, because the same StreamKey generator will be generating all those StreamKeys. Maintaining uniqueness inside an exit Node might be a little harder, since its StreamKeys will have been generated by different generators in all the originating Nodes that are using its services. There should be some assurance that the fourth StreamKey I generate can't be the same as the fourth StreamKey you generate, in case we're both using the same exit Node.

Yet another observation: UUIDs were invented specifically to address almost exactly this kind of problem, and crates.io has at least one library that generates them. The almost part is because UUIDs can be parsed into fields that might give attackers too much information. One set of fields, for example, comes from a time value, which would tell an attacker how old the stream is. (Is this a problem? Maybe.) Another field could allow the extraction of the MAC address of the generating machine's network interface card...although a different version of UUID allows that to be replaced with a random number. If further investigation reveals UUIDs to be insufficiently secure, we could hash them.

utkarshg6 commented 1 year ago

Optional (risky change):

There was an idea proposed during the card 692, which is intended for making --min-hops configurable during runtime.

The Idea: Use min_hops to make the stream key.

This was intended to make new client requests to have a different stream key. Although, this will require multiple changes in a different area of the code. For example, the ongoing payments depend on stream key, and letting the server know that the client has closed the connection for the older stream keys.

dnwiebe commented 1 year ago

Idea for testing:

First, use a version of Node without this change.
1. Set up a MASQ network in such a way that a particular originating Node can only pick one exit Node. Make sure that exit Node is running at --log-level debug.
2. Save the public key of the originating Node.
3. Connect a browser to the originating Node and load two websites. Simple, static websites are best, but the important thing is that the browser must open more than one TCP stream.
4. Shut everything down and obtain the log from the exit Node. Filter it for logs that contain the string 'Sending ClientResponsePayload to Hopper:'.
5. Save the result as the "before" option.
Now, use a version of Node with this change, and do the same thing, but save the filter results from the log as the "after" option.
The next steps will require a bit of programming. If you're comfortable with that, you can do it yourself; if you're not comfortable with it, pass the two logfiles and the two public keys (make sure it's clear which public key goes with which logfile) to somebody who is.
Write some code in a version of the codebase without this change (probably in a temporary "test" somewhere in the Node codebase that can see StreamKey::new()) based on find_public_key_from_stream_key() above to verify that all the distinct StreamKeys in the "before" logfile can be identified as coming from the same Node, and that the port number on that Node can be identified. (Note that you'll need a commit from before this change so that the StreamKey constructor takes parameters.)
Use that code on the "after" logfile to verify that those StreamKeys cannot be identified as coming from the same Node.

MASQ-Project / MASQ-Node-issues

Our StreamKeys are insecure #716