BEP: Extensions supported by tracker

kovalensky commented 1 year ago

Here's a proposal in .rst format:

https://gist.github.com/kovalensky/2d0e6e6b52c50cc96b8bdc4809f4db44

kovalensky commented 1 year ago

Any opinions @ssiloti @the8472 @arvidn ?

arvidn commented 1 year ago

how would you expect clients to act on this tracker response? normally clients just include extra parameters on the GET request as best-effort, assuming they will be ignored if not supported.

Is that problematic?

In my mind, one of the most significant improvements that could be supported by something like this would be if the tracker responds with its max HTTP pipeline depth, allowing clients to pipeline announces to different torrents over the same connection.

kovalensky commented 1 year ago

@arvidn Clients will send announce requests with only the get parameter properties linked to the extensions list response. For instance, if the tracker doesn't include 'bt_v2' in the extension list, it indicates non-support. This spares clients from sending a redundant second request with truncated 'info_v2' hashes. Same for ipv6 connectivity.

The concept of pipelining is indeed quite useful. As far as I know, there's no standardized limit defined in the RFC, leaving it up to individual implementations. Most servers have a reasonable depth, and it's worth considering whether pipelining should be linked to HTTP/2 multiplexing in this context as well?

kovalensky commented 1 year ago

In any case, I see no reason not to include extensions supported by the tracker server in the list.

arvidn commented 1 year ago

if the tracker doesn't include 'bt_v2' in the extension list, it indicates non-support.

The point of v2 hashes being truncated is to preserve the DHT protocol and the tracker protocol. The tracker doesn't need to recognise whether the hash is a sha1-hash or a truncated sha-256 hash. I suppose if you have a whitelist of valid hashes, you could "not support" v2 by not whitelisting them. Is that what you mean?

Same for ipv6 connectivity.

In order for there to be a second announce sent via IPv6, the tracker hostname must resolve to and IPv6 address and it must be connectable via that IP address. What does it mean for a tracker to have its hostname resolve to an IPv6 address, and be reachable via IPv6, but not support IPv6?

The concept of pipelining is indeed quite useful. As far as I know, there's no standardized limit defined in the RFC, leaving it up to individual implementations. Most servers have a reasonable depth

I seem to recall that, 15 years ago, Apache had a quite unreasonable default of 5 pipelined requests. Perhaps this situation has improved.

kovalensky commented 1 year ago

The point of v2 hashes being truncated is to preserve the DHT protocol and the tracker protocol. The tracker doesn't need to recognise whether the hash is a sha1-hash or a truncated sha-256 hash. I suppose if you have a whitelist of valid hashes, you could "not support" v2 by not whitelisting them. Is that what you mean?

Clients initiate two parallel requests for hybrid torrents. While refining TorrentPier's announce script, we encountered a challenge in obtaining the complete v2 hash for finding it in the database and updating statistics. Instead, we resorted to using a 'LIKE' SQL operator with to find matching hashes. Unfortunately, this approach proved to be less performant than anticipated, especially with large tables, due to index bypass.

Furthermore, during discussions with the administration of a prominent tracker regarding the implementation of the BitTorrent v2 protocol, particularly in allowing hybrids, they raised a concern about the potential strain posed by parallel hybrid requests, which could add to the server load.

For trackers endorsing the v2 protocol, a significant reduction in the number of requests for hybrids can be achieved by exclusively announcing the full v2 hash, since it belongs to the same info.

Regarding IPv6, you are absolutely correct. It slipped from my mind for a moment.

The default setting for HTTP/2 concurrent streams in Nginx is 128, unfortunately, I wasn't able to locate precise information regarding Apache.

the8472 commented 1 year ago

Furthermore, during discussions with the administration of a prominent tracker regarding the implementation of the BitTorrent v2 protocol, particularly in allowing hybrids, they raised a concern about the potential strain posed by parallel hybrid requests, which could add to the server load.

A client still has to do both requests to obtain peer lists for both infohashes to be able to tell v1 and v2 peers apart and to know which hash to send in the handshake.

While refining TorrentPier's announce script, we encountered a challenge in obtaining the complete v2 hash for finding it in the database and updating statistics. Instead, we resorted to using a 'LIKE' SQL operator with to find matching hashes. Unfortunately, this approach proved to be less performant than anticipated, especially with large tables, due to index bypass.

This seems like an implementation detail of the particular tracker that could easily be worked around in various ways and shouldn't require protocol changes. Potential solutions I can think of are having truncated -> full hash lookup caches or structuring the database table to have efficient prefix lookups, e.g. by using BTree indices.

kovalensky commented 1 year ago

A client still has to do both requests to obtain peer lists for both infohashes to be able to tell v1 and v2 peers apart and to know which hash to send in the handshake.

For hybrid torrents, the info hashes remain consistent as they are registered under the same topic for the tracker. AFAIK for peer handshakes, connection upgrading helps in this scenario.

This seems like an implementation detail of the particular tracker that could easily be worked around in various ways and shouldn't require protocol changes.

Overall, there's nothing what can't be solved here, but is there any need to truncate info-hashes for the future, summing up complexity with workarounds, wouldn't it be beneficial to provide full hashes for trackers requesting them?

Also my proposal involves reducing the load through the utilization of additional requests or queries.

the8472 commented 1 year ago

For hybrid torrents, the info hashes remain consistent as they are registered under the same topic for the tracker. AFAIK for peer handshakes, connection upgrading helps in this scenario.

There seems to be some misunderstanding here.

If a torrent is hybrid it means current clients should be able to support both v1 and v2 peers. Which means they need to obtain separate peer lists because in theory these swarms could be completely disjoint - a bunch of (legacy) v1-only and (future) v2-only peers that all happen to have ingested that hybrid torrent. It also has to do that to announce its own v1 and v2 capabilities.

That in turn means a hybrid peer has to contact the tracker twice to get those peer lists to know which peer it has to contact with the v1 hash and which with the v2 hash. The peers might support both, but this is not guaranteed.

but is there any need to truncate info-hashes for the future, summing up complexity with workarounds, wouldn't it be beneficial to provide full hashes for trackers requesting them?

There are no plans to send the full hashes because peer discovery mechanisms (trackers, DHT) have no essentially no need for the full infohashes. Internally a tracker can entirely deal with the truncated form without loss of functionality. The full length is only really needed for display purposes or if someone really needs those 128 bits of collision resistance for whatever reason. But neither of those things are needed for a tracker, they're part of an indexing site, which is a separate concept.

And I think you're overstating the complexity. Prefix lookups ought to be trivial. They're simple string operations. Sure a particular implementation may happened to have chosen a data structure that doesn't support it, but that doesn't mean alternatives are inherently more complex. Changing this is a one-time cost in effort, not an ongoing maintenance cost.

kovalensky commented 1 year ago

If a torrent is hybrid it means current clients should be able to support both v1 and v2 peers. Which means they need to obtain separate peer lists because in theory these swarms could be completely disjoint - a bunch of (legacy) v1-only and (future) v2-only peers that all happen to have ingested that hybrid torrent.

Since most clients support v1, for us this approach with hybrids came up: We save statistics only for its v1 hash announcements (recording peer uploads and downloads) and responding. For its v2 announcements, response is the same as we would respond for a provided v1 hash (we use cached response with same peers list), but this is applicable only for hybrid torrents; v2-only torrents are handled through a separate statistics.

Could you please clarify what you mean by v2-only (future) peers? Yes, there was some confusion regarding the terms "tracker" and "indexer". I think that most people commonly use "tracker" to refer to both, as an indexer typically operates with its own tracker software.

There are no plans to send the full hashes because peer discovery mechanisms (trackers, DHT) have no essentially no need for the full info-hashes.

My concerns about adoption of the BitTorrent v2 for tracker (indexers) are as follows:

The indexer's tracker is unable to distinguish the provided info-hash version. Consequently:

1) It cannot gracefully reject them without querying the database, as it could be a v1 hash. 2) We lack means to instruct the client not to send these hashes to us, reducing the load. 3) There's no way to instruct the need for sending full hashes, resolving both of previous issues.

All of these issues are interlinked.

I'm unsure of any potential issues that could arise from sending full v2 hashes to trackers (specifically indexers) that explicitly request them. This action shouldn't interfere with existing client implementations of BEP52.

the8472 commented 1 year ago

Since most clients support v1

For now. We have to plan for the long term where eventually pure v2 clients may be written.

For its v2 announcements, response is the same as we would respond for a provided v1 hash (we use cached response with same peers list),

Yeah that's not really correct. Probably not particularly harmful at the moment but semantically incorrect. And specs certainly shouldn't be written with such behavior in mind.

Could you please clarify what you mean by v2-only (future) peers?

The torrent file may be hybrid but clients might only support v2 at some point in the future. The hybrid format is meant as a way to enable gradual transition towards that future.

as an indexer typically operates with its own tracker software.

For private torrent sites.

The indexer's tracker is unable to distinguish the provided info-hash version. Consequently:

It cannot gracefully reject them without querying the database, as it could be a v1 hash.

We lack means to instruct the client not to send these hashes to us, reducing the load.

There's no way to instruct the need for sending full hashes, resolving both of previous issues.

1: I don't see what makes v2 so special here about database hits. If database lookups are costly your server process should have some in-memory negative hit cache (something like a HashMap<[u8; 20], Time>) to cheaply reject frequently-asked-but-invalid hashes. This would also be true for v1 torrents (e.g. expired ones). Basic software design.

2: As explained above, you can't do that because clients operate on the model that v1 and v2 torrents are distinct swarms. So they must announce both for hybrid torrents. You're basically operating on an incompatible model.

3: That's a quite peculiar argument. You'd just use those extra bits to tell v1 and v2 torrents apart based on the length of the hash and fast-reject them? That's not really why we picked a longer hash. So as far as purpose goes, there still is not really one for conveying those extra bits to a tracker.

Anyway, what magnitude of requests per second and how many database rows are we talking about? I want to get an idea whether this really is a protocol issue or an application issue.

kovalensky commented 1 year ago

The torrent file may be hybrid but clients might only support v2 at some point in the future. The hybrid format is meant as a way to enable gradual transition towards that future.

Okay, I didn't expect that change, in fact I didn't expect any alterations to v2 from now, thanks for clarifying this. I observed this feature in BiglyBT but was uncertain if it would eventually become a standard.

Sending a truncated v2 hash introduces unnecessary complexity to the development process.

We considered preserving the current model for now, as we're uncertain about future modifications. Even if accounting stats for hybrid torrents were to fail at some point due to the absence of v1 announces for hybrids, it would serve as a strong motivation to either update the tracker software or transition to v2-only torrents.

Nevertheless, my BEP proposal remains.

arvidn commented 1 year ago

Okay, I didn't expect that change, in fact I didn't expect any alterations to v2 from now, thanks for clarifying this.

What are your referring to as having changed?

I observed this feature in BiglyBT but was uncertain if it would eventually become a standard.

It's standard as so far as it's documented here.

Sending a truncated v2 hash introduces unnecessary complexity to the development process.

I don't think you've demonstrated that, but perhaps the details are in exactly what you mean by "the development process". As the8472 pointed out, a tracker (as defined by both BEP52 and the original BEP3) doesn't need to concern itself with full sha-256 info-hashes at all. An index probably would though. If you need a mapping from truncated v2 -> full v2 or vice versa, that sounds like one more column in a database table. So, some complexity for you if you need it. But weigh that against the simplification for everyone who doesn't. They can use unmodified trackers and it just works.

Nevertheless, my BEP proposal remains.

I think it would be helpful to include some (strong) motivations in the BEP. Surely you wouldn't expect announces to be two-phase, where these extensions are looked up (best-effort) first, and then announced.

As far as I can tell, I don't see how or why any client would change behavior based on the v2 signal you propose. This could also probably be clarified, to make a stronger argument.

the8472 commented 1 year ago

Perhaps we could say that a client should keep track of the v1 and v2 announce interval separately. Then a tracker can set a higher announce interval for a v2 hash if most clients are using v1 anyway (or support upgrades) which should lessen most of the load. Or vice versa, if v2 support is widespread the tracker can decrease the v1 rate.

bittorrent / bittorrent.org

BEP: Extensions supported by tracker #145