bittorrent / bittorrent.org

393 stars 101 forks source link

the announce logic in BEP 7 should be changed #100

Open arvidn opened 4 years ago

arvidn commented 4 years ago

As far as I can tell, all implementations have (perhaps implicitly) agreed not to implement (EDIT: the &ipv4= and &ipv6= announce parameters from) BEP 7, but instead always announce once per source interface.

I personally support rejecting updating BEP7 and possibly replacing it with a BEP describing the expected announce-logic for clients with multiple external IP addresses/interfaces.

@ssiloti @the8472 thoughts?

the8472 commented 4 years ago

That sort of was the intent of the 2017 change to that BEP anyway.

One issue is support v4 peers on a v6-only tracker and vice versa. This is sort of a niche case on the open internet but might be more reasonable in closed/controlled environments.

So I would say the 1-announce per address family or per source address approach is preferred but it might make sense to keep the parameters as historic / niche uses?

anacrolix commented 4 years ago

FWIW, my implementation does implement BEP 7 (and also BEP 15 for UDP).

arvidn commented 4 years ago

@the8472 I seem to recall the one-announce-per-interface being documented somewhere, but I couldn't find it (I didn't look very hard I suppose). That was a PR against BEP7 from you then?

I think a reason to not implement BEP7, apart from the inaccuracy in predicting ones' own external IPs, is the potential for leaking possibly private information between trackers.

It would be very easy for an implementation to accidentally leak an IPv4 or IPv6 address when announcing over a Tor-circuit, a VPN or a proxy.

X-Coder264 commented 4 years ago

As far as I can tell, all implementations have (perhaps implicitly) agreed not to implement BEP 7, but instead always announce once per source interface.

Is that really the case?

AFAIK your libtorrent (as well as rakshasa's libtorrent) implement BEP 7 or am I wrong? These two libtorrents cover most of Linux torrent clients and a good part of windows ones. The biggest "player" on Windows is utorrent and I haven't used it in a while but I would be extremely surprised if it didn't implement BEP 7.

On the tracker server side all trackers that are not prehistoric support BEP 7. My tracker implements it as well (https://github.com/X-Coder264/tracker).

I personally support rejecting BEP7 and possibly replacing it with a BEP describing the expected announce-logic for clients with multiple external IP addresses/interfaces.

BEP 7 is here already for a couple of years and pretty much all trackers that I know of have implemented it. One reason for that is that BEP 23 (compact response) allows only for IPv4 IPs (and all trackers return compact response by default nowadays) so in order to return IPv6 IPs in compact responses BEP 7 must be implemented.

I think that BEP 7 has actually proven itself and if anything it should be accepted as it's an accepted standard by now even though the BEP itself is still officially just in draft state unless there are some big issues regarding it, but I haven't really seen any of them being voiced by either client or server implementation maintainers (nor from the users).

Discarding BEP 7 and announcing once per source interface would not be a tiny change (at least for server implementations). All BEP 7 stuff (ipv6 parameter handling etc) would need to be discarded and the code in general would have to be adjusted to account for multiple announces from the same peer on the same torrent in the same announce cycle (let's say it's 40 min), but from different IPs (for example the rate-limiting logic which prevents hammering would probably need to be readjusted, depending on the implementation).

Currently let's say there is 1 announce per torrent in an announce cycle. BEP 52 now makes that two for hybrid torrents (one for v1 and one for v2 swarm). Announcing once per source interface in this case would mean that there would be 4 announces (1 for v1 IPv4 interface, 1 for v1 IPv6 interface, 1 for v2 IPv4 interface and 1 for v2 IPv6 interface) for a single torrent for a single peer.

arvidn commented 4 years ago

As far as I can tell, all implementations have (perhaps implicitly) agreed not to implement BEP 7, but instead always announce once per source interface.

Is that really the case?

AFAIK your libtorrent (as well as rakshasa's libtorrent) implement BEP 7 or am I wrong? These two libtorrents cover most of Linux torrent clients and a good part of windows ones. The biggest "player" on Windows is utorrent and I haven't used it in a while but I would be extremely surprised if it didn't implement BEP 7.

I removed it in libtorrent, and then carefully (and somewhat reluctantly) added it back for private torrents, when anonymous_mode is disabled, because those trackers seem to rely on it. I would like to see a future with it removed completely.

The reason to reject BEP 7 (in my opinion) isn't that it's difficult to implement, it's that it has several serious flaws. Briefly:

  1. It's a DDoS attack vector, as you can announce other people's IPs, and have a (possibly) large swarm make connection attempts. e.g. you can announce a victim IP to thousands of the largest swarms. I see you also support the &ip= parameter, which also introduces this problem. Additionally, this parameter was never documented in the specification. It was a hack a guy at bittorrent inc. added to the first python tracker just to be able to run the tracker and the initial seed on the same machine (and announce to the tracker over loopback, but still record it under the correct external IP).

  2. If one is using a VPN that only supports IPv4, you may leak your identity via your IPv6 address. The opposite can also happen, but is probably less likely in practice.

  3. If one has multiple IPv6 and/or IPv4 addresses, it's not clear whether to include multiple &ipv6= and &ipv4= arguments or not. (your implementation seems to expect no more than one each). If you support multiple arguments, you're making problem (1) worse. If you don't, it's non-trivial (for a client) to know which IPv6 address to include when multiple are available.

One reason for that is that BEP 23 (compact response) allows only for IPv4 IPs (and all trackers return compact response by default nowadays) so in order to return IPv6 IPs in compact responses BEP 7 must be implemented.

That's a good point. Perhaps this ticket should suggest amending BEP 7 to specify a more reasonable announce logic. Alternatively, BEP 23 could be updated to specify the peers6 key.

unless there are some big issues regarding it, but I haven't really seen any of them being voiced by either client or server implementation maintainers (nor from the users).

I realize these discussions have mostly happened elsewhere on the internet, there are a few tickets on the libtorrent repo, probably a thread or two on the libtorrent mailing list, and some in person. You could consider this post voicing of such issues.

Currently let's say there is 1 announce per torrent in an announce cycle. BEP 52 now makes that two for hybrid torrents (one for v1 and one for v2 swarm). Announcing once per source interface in this case would mean that there would be 4 announces (1 for v1 IPv4 interface, 1 for v1 IPv6 interface, 1 for v2 IPv4 interface and 1 for v2 IPv6 interface) for a single torrent for a single peer.

That's right. What I was alluding to in the other thread is that all of those can be treated as completely separate swarms. Although, the IPv6 and IPv4 could also return peers across the address families, as a potential improvement (under the assumption that any IPv6 peer also has an IPv4 peer). However, there's no point in sharing any state between the bittorrent-v1 and bittorrent-v2 swarms (afaict). So, those announces are trivially parallelizable.

Also, one IPv4 and one IPv6 is really just an example (albeit probably the most common multi-home one), you can also imagine a peer with 2 IPv4 addresses, or 1 IPv4 and 2 IPv6 etc.

actually, one possible improvement to BEP 52 would be to announce both v1 and v2 hash in a single request. I think that would only add one additional info_hash parameter. Say, &info_hash2=.

X-Coder264 commented 4 years ago

I see you also support the &ip= parameter, which also introduces this problem. Additionally, this parameter was never documented in the specification. It was a hack a guy at bittorrent inc. added to the first python tracker just to be able to run the tracker and the initial seed on the same machine (and announce to the tracker over loopback, but still record it under the correct external IP).

Yes, I support the &ip= parameter. It actually is documented in BEP 3:

ip An optional parameter giving the IP (or dns name) which this peer is at.

But actually if the IP from where the request came from is different than the value stated in the &ip= parameter I override it because as you said anything can be sent in the &ip= parameter.

3. If one has multiple IPv6 and/or IPv4 addresses, it's not clear whether to include multiple `&ipv6=` and `&ipv4=` arguments or not. (your implementation seems to expect no more than one each). If you support multiple arguments, you're making problem (1) worse. If you don't, it's non-trivial (for a client) to know which IPv6 address to include when multiple are available.

Yes, I expect no more than one each as BEP 7 doesn't say anything about the possibility of sending more of them. I don't even know what the HTTP standard says about duplicate query parameters. IIRC the last one overrides all the previous ones. So including multiple &ipv6= and &ipv4= arguments doesn't seem like a viable option at all IMO (or it should be &ipv6[]= and &ipv4[]=).

That's a good point. Perhaps this ticket should suggest amending BEP 7 to specify a more reasonable announce logic. Alternatively, BEP 23 could be updated to specify the peers6 key.

I'd say both are needed. Amending BEP 7 is definitively needed to specify a more reasonable announce logic in light of your raised concerns (DDoS attack vector and multiple IPv6 and/or IPv4 addresses etc).

BEP 23 could be amended with the peers6 key so that everything about compact responses is in one place.

However, there's no point in sharing any state between the bittorrent-v1 and bittorrent-v2 swarms (afaict).

Yes, we agreed about that already in https://github.com/bittorrent/bittorrent.org/issues/87.

So, those announces are trivially parallelizable.

The "problem" is not if it's easily parallelizable or not. It's about creating a system/specification where the client has to make the least amount of announces per torrent as possible to properly announce everything. Because more requests that need to be handled = higher costs for the servers in order to handle them (be it more servers or the same number of stronger servers). Keep in mind that (at least) for private trackers there is a bunch of additional stuff going on on the announce endpoint than just updating the peer data and returning a list of peers on the torrents (loyalty points, recording the stats and snatches depending on a lot of conditions and a bunch of other logic) so let's say quadrupling the number of requests for a torrent really isn't cheap in terms of server power needed to handle that.

actually, one possible improvement to BEP 52 would be to announce both v1 and v2 hash in a single request. I think that would only add one additional info_hash parameter. Say, &info_hash2=.

Yes, if the final BEP 52 goes with option (b) from https://github.com/bittorrent/bittorrent.org/issues/87 (which seems most likely at this point IMO) then that would be really nice to have in the BEP so that there is no need for two request for hybrid torrents.

arvidn commented 4 years ago

The biggest "player" on Windows is utorrent and I haven't used it in a while but I would be extremely surprised if it didn't implement BEP 7.

utorrent removed the &ipv6= announce parameter in 2014. It never supported &ipv4= (but always supported &ip=, but it's special, it can only be configured to be sent manually)

arvidn commented 4 years ago

It actually is documented in BEP 3

Oh, I had forgotten about that apparently. My recollection is that it was a bit frowned upon from very early on.

the8472 commented 4 years ago

@arvidn perhaps there's some confusion here. BEP7 defines 3 things:

This issue is just about the 2nd point, no? Perhaps the title should be adjusted to reflect that.

Because more requests that need to be handled = higher costs for the servers in order to handle them (be it more servers or the same number of stronger servers).

BEP7 is about IPv6 HTTP announce handling. Servers that are short on resources are encouraged to use BEP15 which can be implemented very efficiently (millions of clients handled by a single machine)

arvidn commented 4 years ago

yes, I realized BEP 7 was more than just the announce parameters. It is correct that I'm only concerned about the &ipv6= and &ipv4= paramteters.

I proposed a change here:

https://github.com/bittorrent/bittorrent.org/pull/101

arvidn commented 4 years ago

There's another issue related to the &key= parameter raised recently (here).

The original use of the &key= parameter is for a tracker to be able to tell if a peer change its IP between announces, and update its entry accordingly. i.e. if the key has already been announced from a different IP, it is be treated the same and the old IP is replaced. The reason not to use &peer_id= for this purpose is because that's publicly known (peers tell their ID to each others when they connect), which would allow anyone to effectively remove someone from the tracker by announcing with their peer_id.

In a world where a peer can be multi-homed, each source IP address has better use different keys, otherwise the tracker may know it's a single peer with a single IP that changed between the announces.

However, private trackers like to reliably keep stats, and at least some use the key parameter to identify peers, and using multiple source IPs (and multiple keys) cause them to double count. As reported here.

The current wording for the &key= parameter is pretty clear that it should be a constant identifying a peer for the duration of the session.

I get the impression that essentially nobody is using &key= for the purposes of replacing a previous peer entry. In which case it would make the most sense for me to fix libtorrent to comply to the current wording. I was surprised to find that BEP 3 doesn't actually mention the &key= parameter, so that seems to support this approach.

If anyone at some point would like to be able to keep peer lists updated as peers change IP, we could introduce yet another parameter (say, &if=) to specify some index of which local IP this is coming from.

the8472 commented 4 years ago

I don't see how those uses are in conflict. In a multi-homed setup a client can still use the same key for all interfaces but also switch keys between sessions. Granted, this does lead to sub-optimal privacy as the new session could reassociated when the IP stayed the same, but that sort of is the use of the key parameter anyway.

A more privacy-focused implementation that does not care about what private trackers do simply would not support the key parameter at all, use randomized peer IDs etc..

arvidn commented 4 years ago

the original use (at least per my understanding, but doesn't seem to be implemented anywhere) was that a tracker would use the &key= as the single unique identifier for a peer in the swarm. whenever it gets an announce with that key, it overwrites the previous IP was used to announce from.

A multi-homed peer, one with 2 IPv4 addresses for example, would announce twice from two separate IPs. A tracker as described above would interpret that as the peer moving from one IP to the other, and only the last announced IP would be recorded in its peer list.

the8472 commented 4 years ago

Ah I see what you mean. I mostly wrote that with different address families in mind and not with multiple addresses within a single family.

Technically the current definition doesn't prescribe what trackers should actually do with the key, it only tells clients to send it. A tracker could discard the old IPs, but that would simply mean a multi-homed client would appear under the last of its announced ones, which wouldn't be too bad either.

I guess the primary users of it are private trackers anyway, so the current state of things should be fine.

wangqr commented 4 years ago

Servers that are short on resources are encouraged to use BEP15 which can be implemented very efficiently

BEP15 is not encrypted at all, and I don't see any chance it will be used by PTs, most of which started using https years ago.

It's a DDoS attack vector, as you can announce other people's IPs

No it is not. Let's say a PT has n active clients in total. Each client has x torrents and 2 addresses. Each torrent has y peers.

Obviously n>y, so nx>xy. In a typical middle-sized PT, n=3000, x=150, y=20. So it is just directing less than one percent of the trackers traffic to the victim's machine. And for public trackers with huge n, the double announce will lead to huge difference.

It's probably a good idea to keep all the IPs in the peer database.

If we allow multiple IPs from same address family, things get worse. Most home cable users gets a /64 IPv6 prefix deligation, that's 1.8e+19 addresses. A malicious client can simply fill the tracker's database with those addresses and significantly increase the announce response size, taking up the outgoing bandwith of the tracker.

arvidn commented 4 years ago

It's a DDoS attack vector, as you can announce other people's IPs

No it is not. Let's say a PT has n active clients in total. Each client has x torrents and 2 addresses. Each torrent has y peers.

  • Case 1: One of the peers is malicious and sends a victim's IP in ip= part for all x torrents it have. Now the victim will receive xy connections per announce interval.

Peers are not subject to the announce interval. Normally clients will try to connect to a peer it's heard about a few times before giving up. Say 3 times. So the magnification is not very impressive, but it still seems like an obvious DDoS attack vector, granted, not a very effective one.

  • Case 2: We modify the standard so each client have to announce twice using different addresses. Now the tracker will receive 2nx announces per announce interval, that is nx extra announces per announce interval. They are well-formed requests, and the tracker now must spend resources handling them.

Sure, if it's too expensive for trackers to connect all the peers of the swarm together, they can't do that. But those resources you have to spend as a tracker have 100% utility, because they improve connectivity of the swarm.

How is this different from today though? Multiple announces are allowed today. At least there's nothing in the protocol preventing it or discourage it.

Yes, the tracker can put rate limit here or drop the announce below minimum interval, that's what PT does right now, and that's the reason why they don't support double announce and rely on &ipv6=.

How does it decide which announces to drop? the peer_id? (if so, that's another DoS attack. since peer_ids are publicly known).

Obviously n>y, so nx>xy. In a typical middle-sized PT, n=3000, x=150, y=20. So it is just directing less than one percent of the trackers traffic to the victim's machine. And for public trackers with huge n, the double announce will lead to huge difference.

You're suggesting that almost everyone on the internet is multi-homed. I think that the number of peers that are is insignificant in comparison to the rest of the internet. But even in a distant future where most peers are multi-homed, it would still be desirable to connect over all available paths, so you multiple announces would provide 100% utility. The swarm would be better.

If the tracker can't keep up with demand of IP addresses to distribute among peers, fine, it can't keep up. That might as well be a problem without any peer being multi-homed. The case where there are too many IP addresses to distribute in the world isn't really part of that problem.

It's probably a good idea to keep all the IPs in the peer database.

If we allow multiple IPs from same address family, things get worse. Most home cable users gets a /64 IPv6 prefix deligation, that's 1.8e+19 addresses. A malicious client can simply fill the tracker's database with those addresses and significantly increase the announce response size, taking up the outgoing bandwith of the tracker.

I don't get this point. How is this different from only having single-homed peers today? If two peers have one IPv4 address each, and they both announce, the tracker will accept those two announces from the same address family, right?

I would say the bigger problem in your scenario is that you could flood the tracker with fake clients (at "bogus" IPv6 addresses), and DoS the swarm that way. This is something trackers have to deal with regardless though.

wangqr commented 4 years ago

How is this different from today though?

Today one of them is dropped. That is, if announced twice, we have 2 queries and 1 update on server side. If the tracker is going to accept double announce, then that's 2 queries and 2 updates. Updates are much more expensive, as PT need to calculate multiple scores, and do automated banning / warning based on those stats provided in announce.

How does it decide which announces to drop? the peer_id?

Yes, but that cannot be exploited in DoS. PT has a passkey in the announce URL which identifies the user account. The passkey validation happens first, before the query to the peer table which looks up peer_id and the last announce time. If the passkey of a user leaked, then the user is likely to be banned, not to mention if the passkey is used in DoS. This is also the reason why PT need https to encrypt the announce.

You're suggesting that almost everyone on the internet is multi-homed.

I assumed each client has 2 address, because we usually have both IPv4 and IPv6. v4/v6 dual stack is common on seedbox providers nowadays, but multiple IPv4 address is rare. We also see ISPs using different routes for v4 and v6 traffic, resulting in different network quality. But using different v6 address in same /64 range almost always have same network quality. That's why PT wants to support dual stack, but not multi-ip from same address family.

If the tracker can't keep up with demand of IP addresses to distribute among peers, fine, it can't keep up. That might as well be a problem without any peer being multi-homed. How is this different from only having single-homed peers today? If two peers have one IPv4 address each, and they both announce, the tracker will accept those two announces from the same address family, right?

That's exactly why PT have limits on number of different peer_ids per (account, torrent) pair. So the if you use different peer_id on different addresses but share a same account, the announce that exceed the limit will simply be rejected, without updating anything on server side.

This is something trackers have to deal with regardless though.

This is a problem caused by "if we allow multiple IPs from same address family". As in current standard, tracker can choose to accept at most 1 address per address family. The point here is that even if we says in standard that tracker should allow any number IPs from same address family, it is unlikely to happen in tracker implementation.

arvidn commented 4 years ago

The benefit of announcing each IP interface separately, is that it gives you a 3-way handshake per IP you're announcing. Proving to the tracker that you do in fact have that IP address. It adds trust and seriously hinders lying about ones IP.

If I understand your argument correctly, this additional trust is not worth the cost of an additional

I can buy that the SSL handshake is expensive. I have a hard time seeing processing of the request being inherently expensive (just expensive because of a low quality implementation).

There are some unrelated ways of mitigating the SSL handshake cost, by using HTTP keep-alive, and coalesce multiple request over a single connection.

the8472 commented 4 years ago

Servers that are short on resources are encouraged to use BEP15 which can be implemented very efficiently

BEP15 is not encrypted at all, and I don't see any chance it will be used by PTs, most of which started using https years ago.

I don't think private trackers are relevant here because they have far fewer users and don't have to handle millions of connections per second like open trackers do. So they should be nowhere near hardware limitations, unlike open trackers.

I would like to see actual profiling that shows they're bottlenecked on network IO and not application code. If application code is inefficient it can be optimized without changing the standard.

One connection per announced IP is necessary for security reasons (as @arvidn explains in above post) since clients generally cannot be trusted to announce IPs they don't own. In authenticated environments that may be possible to trust them in principle, but the protocol must be designed with unauthenticated use in mind.

I can buy that the SSL handshake is expensive.

@arvidn TLS session resumption should make that much cheaper since it doesn't involve asymmetric crypto. Perhaps clients aren't making use of that right now?

wangqr commented 4 years ago

TCP connect SSL-handshake processing of an HTTP request

Actually in PT, the most expensive thing is the third one, or to be exact, the database update part. TLS termination nowadays uses AES-NI instructions, and if that's not enough, we can offload that to load balancer. However, PT needs to keep everything accurate, including calculate those scores, record all peer IPs, etc. This will result in multiple DB updates, in peer table, user table, client session table, etc. which makes the update expensive. PT tries to eliminate this by limiting announce rates.

So they should be nowhere near hardware limitations, unlike open trackers. If application code is inefficient it can be optimized without changing the standard.

Yes they could be optimized but that haven't happend yet. Take NexusPHP as example, it has been like that for years. You may blame the code or the PHP+memcached+SQL architecture being inefficient. But the reality is that each valid announce takes dozens of milliseconds to handle, as measured in php code. Remember that the database is not only used by the tracker that handles announces, but also rendering the website, which has account management / torrent searching etc..

And what I'm saying is that the code works on current standard, and we are proposing changes to the standard that breaks the usability.

I don't know how open trackers work. But I think randomly dropping announce or not recording the IP for some random peer isn't a big deal, as the client can use DHT/PEX anyway. But this is unacceptable for PT.

the8472 commented 4 years ago

That may be the case, but it's not an argument for protocol design, otherwise we would be forever limited by the least efficient, unmaintained implementation that just barely works with the status quo.

Remember that the database is not only used by the tracker that handles announces, but also rendering the website, which has account management / torrent searching etc..

In my opinion database handling doesn't need to and shouldn't happen inline with the announce code. The announce code path could keep state in memory (or memcached). A separate process can then perform batch updates of the database. It is not like their statistics-keeping requires ACID.

And what I'm saying is that the code works on current standard, and we are proposing changes to the standard that breaks the usability.

Note that at least 1 announce per address family already is in BEP7. What the updates would change only applies to clients that determine they want to announce more than 1 address per address family, which currently should be fairly rare.

So what exactly do you consider problematic here?

wangqr commented 4 years ago

One connection per announced IP is necessary for security reasons (as @arvidn explains in above post) since clients generally cannot be trusted to announce IPs they don't own.

If that is the case we should throw away DHT and PEX first, as they solely rely on trusting the IP/port provided by other peer.

wangqr commented 4 years ago

Note that at least 1 announce per address family already is in BEP7.

Yes but that is ignored. PT still relies on ip=/ipv4=/ipv6= in the first announce to work, and any subsequent announce simply get dropped due to the minimum announce interval requirement.

The change that will break things is that if we drop those params, then the tracker is forced to either take higher load of traffic, or only recording one of the v4/v6 address for a peer. qbittorrent 4.2.0 has already been banned in multiple PTs due to unable to send ipv6= parameter.

the8472 commented 4 years ago

One connection per announced IP is necessary for security reasons (as @arvidn explains in above post) since clients generally cannot be trusted to announce IPs they don't own.

If that is the case we should throw away DHT and PEX first.

That's hardly a constructive argument. But I'll be charitable: DHT does the same 3-way handshake to authenticate claims to an IP just like tracker announces do. PEX does suffer from DDoS concerns, but unlike trackers or the DHT it is only a secondary peer source, so clients can afford to be far more aggressive about discarding data from it.

Yes but that is ignored. PT still relies on ip=/ipv4=/ipv6= in the first announce to work, and any subsequent announce simply get dropped due to the minimum announce interval requirement.

Well, this can be easily solved by just buffering the announce data for a second on the server side without doing database updates. If a client does multiple announces in a short window they can be aggregated and then you can do your stats processing on the aggregated data.

As mentioned in my initial comment perhaps we could keep the IP parameters for such niche uses but mention that trackers should only rely on them in setups where clients are authenticated and ignore the otherwise. But it still is an unnecessary protocol complexity since such announces generally should be cheap as far as network traffic goes and it is only a matter of efficient handling on the application side.

Sidenote: PT is a terrible abbreviation, is it private tracker or public tracker?

wangqr commented 4 years ago

DHT does the same 3-way handshake to authenticate claims to an IP just like tracker announces do.

We are talking about a malicious peer here. IMHO if a peer decided to send victim's ip= to the tracker to cause DDoS on victim, it can also reply any DHT find_node/get_peers queries with victim's ip and port included. Is DHT doing anything to prevent this?

And, if the tracker verifies the connectivity upon receiving announce, then the current tracker standard is not vulnerable for this kind of IP spoofing at all. The victim will only receive one connection from the tracker.

PT is a terrible abbreviation, is it private tracker or public tracker?

Sorry for the confusion. By PT I mean private tracker.

arvidn commented 4 years ago

We are talking about a malicious peer here. IMHO if a peer decided to send victim's ip= to the tracker to cause DDoS on victim, it can also reply any DHT find_node/get_peers queries with victim's ip and port included. Is DHT doing anything to prevent this?

I don't think you get any magnification by doing that. Just tricking one peer to connect to the victim (instead of just doing it yourself) won't really buy you anything, will it? The amount of outbound bandwidth you need is about the same as just connecting yourself. Perhaps the peer you give the victims IP to will try it 3 times, so there's that.

What would have been effective is if you could stick someone else's IP in the peer-list returned by legitimate peers. That would give you amplification. But, as the8472 points out, that is mitigated by an equivalence of a 3-way handshake.

the8472 commented 4 years ago

Is DHT doing anything to prevent this?

It's complicated, but yes. Can we stop this argument now because it is is only a distraction in his topic. Even if the DHT were totally insecure that would not be an argument against improving security of the tracker protocol, it would only mean that the DHT also needs improvement.

And, if the tracker verifies the connectivity upon receiving announce

Which is not required by any BEP. Perhaps some language in that direction is needed anyway, since the ip field is mentioned in BEP3 after all. But in practice very few trackers and clients actually support it due to the security issues mentioned.

wangqr commented 4 years ago

Which is not required by any BEP. Perhaps some language in that direction is needed anyway, since the ip field is mentioned in BEP3 after all.

I don't know if it is required either. I just saw you said that there is some 3-way handshake going on, so I assumed there is some verifications here. Plus, NexusPHP is verifying TCP connectivity before inserting the peer IP to peer list. If some verification is required by DHT, we can also make it mandatory for trackers. Then we will have a consistent way to prevent IP spoofing instead of introducing something new and may double tracker load. The security is determined by the least secure protocol after all.

Multiple announces are allowed today. At least there's nothing in the protocol preventing it or discourage it.

As stated in the above calculation, in my opinion, putting significantly larger amount of traffic to tracker constantly is not an acceptable way to solve a potential "DDoS" that won't exceed a few percent of the tracker traffic in worst case. It is just literally DDoSing the tracker using 30x traffic of what you called a "DDoS". I have no idea how this chage made the way to BEP7 in 2017, and that looks definitely wrong from private tracker perspective. I'm not aware of any private tracker that supports them, but I am aware those dropping them or filtering them using web app firewall.

wangqr commented 4 years ago

And, I'm only concered about private torrents here. I was linked here from qbittorrent/qBittorrent#11819. Removing ip= from public torrents have no impact in my use case.

arvidn commented 4 years ago

@wangqr can you explain how you come to the conclusion that this would double tracker load? What portions of users have IPv6 connectivity? I don't means just have an IPv6 address that's not connected to the internet, but actually have IPv6 connectivity. i.e. pass the trackers NAT check.

I would expect it to be negligible.

wangqr commented 4 years ago

What portions of users have IPv6 connectivity?

I've quickly checked a new torrent that has 100+ seeders from the private tracker web page, which shows the address family.

And the top 20 users sorted by outgoing traffic are all v4/v6 dual stak. Note that this by no means represent the IPv6 adoptation rate in public trackers.

I've also tried to download another torrent locally so I can see the real IPs. Among the 30 peers I'm downloading from, about half of them are using IP from OVH or online.net, both are dedicated server provider with v6 support. v4/v6 are about half and half and varies over time. I also know that here in US, both AT&T and Comcast's home cable network comes with one public IPv4 and a public reachable /64 IPv6-PD. More on Wikipedia

arvidn commented 4 years ago

About 70 users are v4/v6 dual stack.

It's not trivial to know whether you have internet connectivity over IPv6 or if you just have an IPv6 address on your local network. libtorrent (qBT and Deluge) for instance, are not very sophisticated in determining whether and what to pass as &ipv6= tracker parameter. In other words, just the fact that a client is announcing an IPv6 address, isn't necessarily a very strong indication of them having IPv6 internet connectivity.

Do you know how many of the peers are announcing twice? once for each address family? (libtorrent has been doing this for years)

wangqr commented 4 years ago

Do you know how many of the peers are announcing twice?

Here is a screenshot of the peer software from the 100+ seeders torrent mentioned above, sorted by outgoing traffic of that specific torrent. Hope this can gives an overview of what people are using here. Sorry for unable to provide the full screenshot. Notice that qb 4.2.0 has been banned by this site. And the site does not support the double announce. img

In other words, just the fact that a client is announcing an IPv6 address, isn't necessarily a very strong indication of them having IPv6 internet connectivity.

The site is checking the type of address (so it drops link-local address, shows tereno/native, and country) and doing TCP connectivity test here.

arvidn commented 4 years ago

The site is checking the type of address (so it drops link-local address, shows tereno/native, and country) and doing TCP connectivity test here.

So, performing the NAT-check is cheaper than accepting a second announce? That doesn't sound right.

arvidn commented 4 years ago

I don't really see what that that screenshots adds.

wangqr commented 4 years ago

So, performing the NAT-check is cheaper than accepting a second announce? That doesn't sound right.

That's one TCP vs one HTTP request

I don't really see what that that screenshots adds.

I don't know how many of the peers are announcing twice, so I just provided that screenshot

Rootax commented 4 years ago

About IPv6 connectivity, it's not negligible IMHO. I see more and more ipv6 addresses in my client. In France (and Europe AFAIK) more and more ISPs are providing ipv6 addresses, and only a shared (between multiple subscribers) ipv4 behind a GC-NAT. Most of the isp's modem/router doesn't filter ipv6, so they will have a ipv6 connectivity, and no ipv4 connectivity (thx to GC-NAT). 5 years ago nobody cared about ipv6, but now it's here and it's more and more the norm in some part of the world.

arvidn commented 4 years ago

@Rootax I tried (though, no very hard) to find some data, but couldn't find anything at a first glance. Do you have a reference by chance?

arvidn commented 4 years ago

That's one TCP vs one HTTP request

Makes sense. The additional SSL handshake for sure makes it inherently more expensive.

I don't know how many of the peers are announcing twice, so I just provided that screenshot

It's possible all the peers that are IPv6-only, are just announcing both their IPv4 and IPv6 address then, but the former is clobbered.

Rootax commented 4 years ago

@Rootax I tried (though, no very hard) to find some data, but couldn't find anything at a first glance. Do you have a reference by chance?

About ipv6 usage ? Sure. it's specific to France but :

Approx 40% of request to google services are done in ipv6 :

https://lafibre.info/index.php?action=dlattach;topic=40573.0;attach=77489;image

And here you have the ipv6 usage by isp :

https://lafibre.info/images/ipv6/201911_arcep_barometre_ipv6_fixe_4.png

Orange is the number one isp in france, with 85 to 100% ipv6 activation for xdsl and ftth (fiber). SFR is late to the game but they will move in 2020. Same thing for Bouygues. Free as you can see enable ipv6 by default, and ipv4 behind a GC-NAT. Bouygues will do the same when they deploy IPv6.

It was the numbers in june 2019.

More recent numbers, including the mobile networks of the isps are here : https://lafibre.info/ipv6/stats-ipv6-apnic/

So 92% for Free, 59% for Orange, 36% for Bouygues, 0.4% for sfr... But you see the trend in the graphs.

ghost commented 4 years ago

How does a tracker deal with the double count issue? Afaik majority of the trackers will count a peer as unique by their IP address. If you double announce, there's no way to prevent a double counting of stats.

arvidn commented 4 years ago

A tracker could use the key to identify the peer, or it could have something like a unique "passcode" as part of the URL to identify a peer.

A tracker that accepts announces for both IPv6 and IPv4 would have to expect a single peer announcing once to each one.

ghost commented 4 years ago

I think client should check if one endpoint already successfully announced. If so, the subsequent from another endpoint should contain nulled value for downloaded & uploaded. If tracker uses key to identify peer and it remains same for both IPs, then tracker will treat them as same peer. If they're treated as same then the tracker will store either one of the IPs and discard the other. The swarm will end up getting only one of the IPs.

arvidn commented 4 years ago

If they're treated as same then the tracker will store either one of the IPs and discard the other. The swarm will end up getting only one of the IPs.

There's nothing forcing the tracker to behave that way, and it would seem sub-optimal. Before the tracker is made to support receiving announces from multiple endpoints, it should probably fix that behavior.

wangqr commented 4 years ago

Is it possible to revert the breaking change in libtorrent for now, and add it back only if the above proposal actually is widely accepted and get its way into BEP? [Removed]

ghost commented 4 years ago

There's nothing forcing the tracker to behave that way

You’re right. But most trackers that are currently running were not coded to accept announces from same peer through multiple endpoints. Most trackers would treat them as unique peers and end up with double counting the stats. Also the ‘&key’ is not sent by some clients so I doubt if any tracker uses that to track same peers. This kind of small key could be duplicate as well among millions of peers.

arvidn commented 4 years ago

Is it possible to revert the breaking change in libtorrent for now, and add it back only if the above proposal actually is widely accepted and get its way into BEP?

Which change is it that you consider "breaking"?

The updated BEP documents existing practice. libtorrent has had this behavior (but in a less generic for, only covering the most common cases) for years. Avoiding the &ipv4= and &ipv6= parameters has also been existing practice for years.

wangqr commented 4 years ago

Which change is it that you consider "breaking"?

Removing &ipv4= parameter

arvidn commented 4 years ago

that's actually still there. See here and here.

It's only enabled for private torrents, if anonymous_mode is not enabled and if we know the IP address (there's no point in sending the IP from the local network for instance).

wangqr commented 4 years ago

that's actually still there. See here and here.

My apologize for the previous comment. Just gave v1.2.4 a try and it is working.