Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.78k stars 443 forks source link

DHT node spam from exit nodes #3065

Open Captain-Coder opened 7 years ago

Captain-Coder commented 7 years ago

During the work on Gumby's DHT-isolation module I noticed the following.

For each circuit length (1 hop, 2 hops, 3 hops, etc) Tribler sets up an instance of LibTorrent. This instance has its DHT enabled. The DHT traffic generated by the libtorrent instance is tunneled to the exit node from where it is forwarded to the internet. So effectively each exit node apprears as a single IP hosting many DHT nodes, with each its own short lived port number.

However most of the nodes forming the public DHT employ measures to combat this sort of node spam from single ip's. Many will not even consider multiple IP's from a single /24 subnet in each routing table bucket. We could end up on blacklists, or simply get no replies, or get flakey service. On top of that the high node churn due to circuits closing and reopening on different ip:port numbers does not help matters along and makes an exit's IP look very unreliable. The tunneled libtorrent DHT's will not end up in routing tables, and not perform their function as intended.

Notice that even the default (non anon.) libtorrent instance conflicts with the pymdht that Tribler runs on the same host to support e2e encryption. Though this effect is (probably) of a lesser degree and has less impact.

One proposed solution is as follows:

The pymdht layer should be checked to see if it does caching right, so it should not spam messages (either announce or lookup). If this happens we are likely to trigger flood defences in other DHT implementations, for example because everyone is downloading their favorite new torrent.

From what I understand of the tunnel code, the dht messages are already encrypted and are only seen unencrypted by the circuit origin and the exit, so the privacy of DHT requests should be good. However, the person fixing this issue should probably first investigate if my intuition and understanding is accurate. Or otherwise think of the privacy concerns of the proposed fix.

Also if BEP42 (dht security extensions) ever gets supported, our problems will multiply. BEP45 has some interresting observations about many ip/multi homed DHTs that also applies to our current libtorrent DHTs hopping exit address every so often. This should be fixed by using the exits pymdht instance for everything.

Links: http://www.bittorrent.org/beps/bep_0005.html (dht standard) http://www.bittorrent.org/beps/bep_0042.html (draft, dht security) http://www.bittorrent.org/beps/bep_0045.html (multi homed DHT instances)

devos50 commented 5 years ago

@egbertbouman is this issue addressed with the implementation/deployment of our own DHT overlay?

egbertbouman commented 5 years ago

@devos50 Unfortunately, it's not. We're not using our own overlay for bittorrent DHT lookups.

synctext commented 4 years ago

This is now a critical issue for #3868. We can't get swarms stats due to DHT security. No idea for fix yet!

qstokkink commented 4 years ago

One possibility is to pay out for DHT lookups and distribute the load in the network.

synctext commented 3 years ago

We conducted a DHT reliability experiment. We seen the effect in the wild from the Libtorrent DHT maximum speed limit. (5 messages per second)

egbertbouman commented 3 years ago

Just some quick graphs. The first one shows the number of responses received after sending 1000 requests to 100 random DHT nodes at different rates. The second one shows the percentage of peers that blocked us during the experiment.

I'm not sure how trustworthy these figures are considering fewer peers blocked us at higher request rates. That could also be due to an issue on our side.

  dht_responses_100_peers

dht_blocked_100_peers

hbiyik commented 2 years ago

Dont want to sound like a jerk but, dont you think that lack of exit nodes is basically making the whole network centralized? Dht Spam looks like a part of the problem.