anacrolix / torrent

Full-featured BitTorrent client package and utilities
Mozilla Public License 2.0
5.58k stars 630 forks source link

Support BitTorrent v2 spec: BEP 52 #175

Open ewhal opened 7 years ago

ewhal commented 7 years ago

https://github.com/arvidn/libtorrent/issues/2197 https://github.com/atomashpolskiy/bt/issues/28

anacrolix commented 7 years ago

But why?

christian-roggia commented 7 years ago

Here you will find a good reddit TL;DR about the changes and improvements of The BitTorrent Protocol Specification v2: https://www.reddit.com/r/programming/comments/6safyl/the_bittorrent_protocol_specification_v2/

For more detailed information read the bittorrent.org BEP: http://bittorrent.org/beps/bep_0052.html

The most important improvements are:

Not really sure this is currently really needed, but it is definitely something soon or later should be fully implemented (with the help of the community).

anacrolix commented 3 years ago

A useful link from a related issue: https://blog.libtorrent.org/2020/09/bittorrent-v2/.

ghost commented 3 years ago

@anacrolix when will you support this?

anacrolix commented 3 years ago

Not until I see some uptake in the wild. Currently it appears to be zero.

KyleSanderson commented 2 years ago

Any update on this one? This is absolutely "in the wild".

anacrolix commented 2 years ago

I am considering this, there's been an uptick in the community, I think usage might start to appear.

Is there any interest in funding this support?

KyleSanderson commented 2 years ago

I am considering this, there's been an uptick in the community, I think usage might start to appear.

Is there any interest in funding this support?

Specifically I have a PR into a project that already uses this library. https://github.com/autobrr/autobrr/pull/491

This breaks a handful of sites by parsing the contents as the clients well support V2, but the library does not. As V2 has become widely accepted and mainstream over the last year, this is only going to become more prevalent as time progresses.

So, similar situation to you, as a user there's pain. I have an old parser for v1 I can dust off, but then the maintenance burden shifts.

kovalensky commented 1 year ago

When? Tell me honey when? I'm just to hungry for this, give me, give me that juicy BitTorrent v2 support, make it faster, deeper in implementation details, make it swim inside my web server, take all of my resources, make it bigger.

But seriously though, 6 years have passed, everyone had put their heads in a hole in the ground and saying that they need to see other ones to try it first, but the other ones think say same and as a result users are losing a great feature.

fiatjaf commented 1 year ago

How about not doing breaking changes in open protocols next? They should just delete BEP 52 and forget it has ever existed.

KyleSanderson commented 1 year ago

How about not doing breaking changes in open protocols next? They should just delete BEP 52 and forget it has ever existed.

That's not how the world works.

Transmission v4 just went GA with BT2 support, which means that every major client has full support for this now.

fiatjaf commented 1 year ago

Thank you for teaching me how the world works with your 6-year protocol update. Since you know how the world works, tell me for many years will the BT1 protocol still be supported alongside with BT2 because some people and apps won't switch.

anacrolix commented 1 year ago

I've not said I won't do it, but time is money and I have a lot to do. BEP 52 messes with the most fundamental part of BitTorrent so a transition will always be difficult. BEP 52 isn't really worthwhile for the most traditional use case for BitTorrent, downloading everything up front. The story is very different for indexers, search, and ephemeral use of BitTorrent for example in hosting websites, streaming etc. Again, I can accelerate this with some support, and/or can always take well considered PRs.

kovalensky commented 1 year ago

How about not doing breaking changes in open protocols next? They should just delete BEP 52 and forget it has ever existed.

That's not how the world works.

Transmission v4 just went GA with BT2 support, which means that every major client has full support for this now.

This is great news, but actually current main task is not the implementation of protocol itself, but more like implementation of its promised features as cross swarm seeding with clients having hash db in memory. So if clients typically searching dht could ask other clients if they have files with This root hash, and if requested clients did in their hash db, they would seed this file, reviving dead torrents, increasing connectivity and decentralization. Other feature was deduplication of files like swarm merging by reducing their size on disk storing relativities in hash db.

It's just as most open-source projects there are more great features than man power to do this.

balupton commented 1 year ago

Could those with the ability to implement this propose a budget that they would need? Perhaps it can be crowd-funded.

anacrolix commented 1 year ago

It would take me about 2-3 weeks but could extend a bit more. It would likely spill over into anacrolix/dht a bit, require some special flags for bencoding, probably some refactoring of storage. There might also be some refactoring of peer connections, to handle the different hash/info requests. There would need to be a fair few tests. Maybe $3k?

kovalensky commented 1 year ago

Could those with the ability to implement this propose a budget that they would need? Perhaps it can be crowd-funded.

I don't mean to offend @anacrolix, but the current situation is that we need this to be implemented in something popular to get others' attention to do it. I wrote to the libtorrent maintainer, his library is used in the second most popular open source qBittorrent, Deluge, and he has a lot of BEPs co-written, he said he would help PR, but currently his limiting factor is time, and he is rewriting caching mechanisms, so even bounty is not the case therefore, I am looking for a C++, Boost developer, to discuss a further task and start crowdfunding, I'll keep this topic updated in case.

anacrolix commented 1 year ago

Maybe I've missed something, but this GitHub issue tracks BitTorrent v2 support in anacrolix/torrent, which is written in Go. Completion of this issue will be when anacrolix/torrent implements BEP 52.

izissise commented 1 year ago

Hello, is there any news on this?

anacrolix commented 1 year ago

This is blocked on funding currently. The details are in an earlier comment.

anacrolix commented 1 year ago

I'm pretty keen to do this but I can't justify working on it.

kovalensky commented 1 year ago

I'm pretty keen to do this but I can't justify working on it.

IMG_20231117_235300_431

weebney commented 11 months ago

Would be happy to help implementing this as I've been given a worthwhile amount of grant money to explore BEP 52 in 2024.

Edit: please contact me after the holidays. Happy 2024, everyone—hopefully we can get cracking on this soon.

anacrolix commented 10 months ago

@weebney what's your interest in supporting this development? I can't find any PM details, please use mine if there's a private aspect to your contribution.

I have a sponsor for this feature, I intend to start work on it soon. Additional sponsorship and help is welcome.

I've also updated the IssueHunt details, https://github.com/anacrolix/torrent/issues/138 (Webtorrent support) was previously funded in a similar way.

weebney commented 10 months ago

If you've got this covered, I'm likely to put my effort into other projects that need attention. Let me know if you need any help though—would be happy to contribute here on a personal basis.

anacrolix commented 8 months ago

Development on this has begun at https://github.com/anacrolix/torrent/tree/bittorrent-v2.

Some interesting things I've discovered since working on this:

Other major clients (Transmission 4, qBittorrent 4.4+) claim to have BitTorrent v2 support. It does not work in my testing. Both of those clients can consume hybrid torrents. They do not work with exclusively v2 torrents. Neither support creating hybrid or v2 torrents either (which is weird because qBittorrent claims to support it but I definitely don't see it). I'm using Transmission 4.0.5 and qBittorrent 4.6.3.

There's a lot of confusion about needing "special" support by trackers/DHT etc. My reading is that's all nonsense. You could add support in trackers to automatically combine swarms but it would not be trivial. If the client is aware of multiple swarms it can do this for both DHT and trackers. I have numerous downstream projects that rely on a single infohash and I believe there's a simple way to migrate this forward.

The file piece alignment sounds great, but it is a total nuisance trying to rejig 10+ years of structuring everything around pieces rather than files. It's likely still a net win. There's a lot of concern about it being inefficient for lots of small files. I read that as files smaller than the piece length, but I think actually it's only files that are smaller than the block size, 16 KiB, which is much more palatable.

kovalensky commented 8 months ago

which is weird because qBittorrent claims to support it but I definitely don't see it

BitTorrent v2 added to libtorrent in 2.* branch. When downloading qBittorrent from Fosshub, you have to choose "lt20, qt6" version. They said they will drop lt12 in 5.0.0 as well as Windows 7 support.

There's a lot of confusion about needing "special" support by trackers/DHT etc.

One wave of confusion started when people didn't find a way to express themselves in term of trackers (public & private & public with rating). For reference, Rutracker doesn't allow hybrids, because of paddings in file list (fix for this is two lines of code btw) and double announces.

Neither it supports v2 only torrents because there should be specific code for this case. They are based on an old version of the TorrentPier engine which is the first engine supporting BitTorrent v2 (stats, file hashes display, etc..), but they will never update due to the custom modifications they made.

In common, discreditization of protocol started when private & public with rating trackers' developers due to lazyness for implementation (can relate though, it took me two weeks of debugging just to re-implement v2 compatible announcer with stats) started to come up with counter arguments and it accelerated with exxagerations from users completely unfamiliar with the protocol.

anacrolix commented 8 months ago

BitTorrent v2 added to libtorrent in 2.* branch. When downloading qBittorrent from Fosshub, you have to choose "lt20, qt6" version. They said they will drop lt12 in 5.0.0 as well as Windows 7 support.

Thanks! Very helpful. I see the lt20 version you mentioned, it's also available in homebrew.

One wave of confusion started when people didn't find a way to express themselves in term of trackers (public & private & public with rating). For reference, Rutracker doesn't allow hybrids, because of paddings in file list (fix for this is two lines of code btw) and double announces.

Yeah it's not been trivial to fix up the assumption that files are packed. It's also harder because v2 hashes unpadded pieces, where as v1 hashes including the padding files. So pieces essentially have different length depending on whether a torrent is v1 or v2.

In common, discreditization of protocol started when private & public with rating trackers' developers due to lazyness for implementation (can relate though, it took me two weeks of debugging just to re-implement v2 compatible announcer with stats) started to come up with counter arguments and it accelerated with exxagerations from users completely unfamiliar with the protocol.

That's handy to know. I have gotten it working, but there are lots of corner cases I expect to need to smooth over. It's complex enough that I'm starting to think Go is not sufficient for this, it would have been much easier to port a client in Rust.

The branch above can now download both hybrid and pure v2 torrents.

go install github.com/anacrolix/torrent/cmd/torrent@bittorrent-v2
wget https://libtorrent.org/bittorrent-v2-test.torrent
torrent download bittorrent-v2-test.torrent

Similarly you should be able to download the hybrid torrent at https://blog.libtorrent.org/2020/09/bittorrent-v2/, or hybrid torrents that are available elsewhere (they're much easier to find than pure v2 torrents).

It's very early days, and there will definitely be crashes and bugs. I also expect that pure v2 magnet links won't work yet.

anacrolix commented 8 months ago

Forgot to push the updated branch https://github.com/anacrolix/torrent/tree/bittorrent-v2. It's now pushed.

anacrolix commented 8 months ago

Support is now in master. Please try it out. It's not complete, but hybrid and pure v2 torrents should now be supported. There's missing support in some tooling. There's no hybrid or v2 torrent creator. Some of the storage backends may do the wrong thing. There's a few shortcuts taken in the protocol for now, they should improve over time.

weebney commented 8 months ago

Thanks @anacrolix Let me know if you need any support; I'm back to "real" work soon and should still have some time to put towards a more complete implementation

anacrolix commented 8 months ago

If you have a downstream project, update to master and let me know if anything isn't working.

A few good hybrid and pure v2 torrents would be useful as test cases too. I have pinched a few from my DHT indexer which I've updated with v2 support, but there aren't many popular pure v2 torrents in the wild.

It would also be nice to have someone go over and/or test the dual-swarm support: Check that announcing are working on both v1 and v2 simultaneously for trackers and DHT.

I've done zero testing of BitTorrent v2 with WebRTC. There's very little extra needed to link it in if at all. I have no idea what the state of v2 support is in webtorrent.

Not yet supported:

Some cool things I'd like to have:

anacrolix commented 4 months ago

Support for BitTorrentv2 was included in v1.56.0, thanks to a generous sponsor. The above features aren't yet implemented, but aren't necessary for compatibility with the network. I'm open to more sponsorship to complete some of those.

balupton commented 4 months ago

Posting my particular ideal use case here for bittorrent v2

I have various libre licensed folders compromised of stuff from archive.org and whatnot that I've accumulated over the years. Most of the stuff would definitely be valuable to seeding, however bittorrent v1 means I have to create torrent for each specific and correct folder and file structure within. My goal would be to just list the directory as available via bittorrent v2, and it then seeds everything to others; which because hashing with bittorrent v2 is individual files, this is possible. This media library directory will remain mutable on my file system, and the BitTorrent v2 client will make sure the bittorrent v2 listing is up to date. The goal isn't so much publicly listing the entire media library, but to give back and continue seeding the contents of the library to those already requesting any such files that I may have; while my actual filesystem remains mutable.

The specific env for this will be a raspberry pi 5 machine, probably running https://github.com/varbhat/exatorrent for new content acquisition /cc https://github.com/varbhat/exatorrent/issues/406

kovalensky commented 4 months ago

@balupton Oh mate, if you knew how situation is dire in this direction.

This is a long standing wish of BitTorrent users, not implemented anywhere yet, many dead smarms could be alive today with this.

v2 allows it, but one client implementing this wouldn't be enough. There should be a BEP, or at least a de-facto standard, like the one in IPFS (Gossips protocol I assume), so other devs could seamlessly integrate it.

izissise commented 4 months ago

I have a similar use case to write a plugin for nbdkit that would keep in only the currently used FS blocks in RAM.

I have a proof of concept here https://github.com/Wuageorg/nbdkit/blob/t0rrent/plugins/golang/examples/t0rrent/t0rrent.go

anacrolix commented 4 months ago

Thanks for sharing this balupton. This is actually a common misunderstanding of what BitTorrent v2 brings. It does improve the hashing system so that such an implementation would be easier and more efficient, but the real blocker is that BitTorrent partitions swarms on infohashes. This I think was originally just a natural choice as torrents were just a single file, but now it's more like a happy coincidence that provides a significant source of performance in BitTorrent vs for example IPFS.

If you announce content at a granularity smaller than the torrent, you create an enormous amount of overhead to maintain your availability on the network for content. The trade off is in your block size. The current "block size" is essentially the entire torrent. You can create extra layers, for example gossiping and taking advantage of the fact that peers that share common data are likely to share more related data.

BitTorrent v2 significantly improved the situation where you have multiple torrents (explicitly added) with overlapping data. The main reason for this is that piece size is not a factor in hashes, and there's a strict guarantee that files do not share blocks (this was supported in V1 but not universally implemented). A client can implement storage that intentionally takes advantage of this (anacrolix/torrent is a client that has been designed for this).

You should also be aware of BEP46 (again anacrolix/* supports this and has used it in production). If you can handle a single publisher you can evolve torrents over time, and with a client with smart storage you can efficiently map data to newer versions of torrents.

Another solution is just to keep adding newer versions of torrents that you want to support, such as from a feed, and use the above smart torrent storage. Your client will just make itself available in all the related swarms (you need to know what those all are).

It's my observation that the "all content should be addressable" thing comes up frequently and leads to long forays away from what BitTorrent excels at, and how at least its model of decentralised data works well for: The resilience of data is proportional to its popularity. If you can't find a popular "bookmark" to the content, the associated data will die. The end result is very long lived, high quality content, and short lived, very popular content.

I've operated a system that scales in a similar fashion to archive.org. I think the point I'm trying to make is that using BitTorrent for archive storage is an extremely long tail feature, where it helps when there's short bursts in popularity, but you can see they mostly rely on webseeding and centralisation to locate the torrents and bootstrap the data.

The opposite possibility is a system where you can do a decentralized search on a DHT and get back merkle tree pointers to the files you want. Then you could look up the trees and peers holding all the data on demand. I don't think anyone has solved that yet. I'd be willing to try (and I've tackled parts of it in the past), but I imagine I'd end up with another tyre to throw on the fire with IPFS and all its lesser known alternatives.

anacrolix commented 4 months ago

Re the last comment I think we could start a productive separate discussion to go over any interest in implementing the mentioned concerns on BitTorrent, including any existing attempts and research that exist. Feel free to create this discussion in anacrolix/torrent.

An update on these:

Not yet supported:

  • [ ] Replying to hash request
  • [ ] Handling hash reject
  • [ ] Asking for proof layers in outbound hash request
  • [ ] Handling BEP 52's piece reject independently of BEP 6
  • [ ] Creating hybrid and v2 torrents

I've realised this is probably the most important. It creates forward compatibility for all torrents made by this implementation.

  • [ ] Handling infohash upgrades during handshake

I think I might have already done this, I need to check.