A few onion/DHT questions

nazar-pc commented 7 years ago

Reading the spec I have a few more questions. I'll combine them into a single issue since they are related.

The spec says that nodes for onion path include DHT nodes and TCP relays. Can someone clarify how many nodes does Tox keep in memory or somewhere else to choose from while building a new onion path? There is quite a lot of numbers specifying timeouts, but this information seems to be missing. I'm asking because this it is very important to have a huge number of known nodes that are not yet connected to each other (meaning they are unlikely to be started by the same person, which is the case while we traverse deeper into DHT), but this will require a lot of time/bandwidth. In BitTorrent DHT it is less of an issue, but in Tox this is a crucial piece used for anonymity.

The next part is probably because I'm lacking some general understanding of onion routing at the moment, but I'd be thankful for clarification or links to relevant specifications.

So when selecting the nodes for future onion path, the node should connect to them and share public keys with each node that is going to be a part of the future onion. If this is the case, how does this happens exactly?

Also since we are connecting directly to those nodes prior to constructing onion path this should reveal a lot of information about future onion path to someone who is eavesdropping our Internet connection.

Feel free to point me at specific sections of the spec if I'm missing something obvious.

zugz commented 7 years ago

This part of the design has not been finalised, which is probably why it isn't addressed by the spec. In the current implementation, after an initial bootstrapping phase, the nodes we use when constructing onion paths are those which reply to our onion requests. This is not at all a good approach, as is acknowledged by a comment in the code:

""" // TODO(irungentoo): remove this and find a better source of nodes to use for paths. onion_add_path_node(onion_c, ip_port, public_key); """

Do you have any suggestions for good ways to choose nodes?

One possibility is to pick a random point in the DHT and find some nodes close to it, and on paths from us to it. This is called a "fake friend" in tox. It's how the initial bootstrapping phase I mentioned above works, and is what I was thinking of using for #547. But close nodes could well be conspiring, so we should be careful with this.

nazar-pc commented 7 years ago

Yeah, user can easily hit some random node that will only share information about nodes that are controlled by the same user. If this happens closer to bootstrap process, more nodes you know about might happen to be fake. Since there are ways to create a bunch of nodes on different networks relatively cheaply and they will be somewhat distributed in ID space, it might be a serious issue for anonymity.

Looks like I have to go deeper into how Tor implements this (I've tried to find some videos/presentations, but they only touch the surface, not specifics). No good suggestions here yet, since I do not entirely understand existing implementations and consequences of the choices that were made.

Thank you for the link, will start reading from there.

TokTok / spec

A few onion/DHT questions #56