Torrents out of Tribler dir

ichorid commented 5 years ago

Tribler supports seeding torrent files from any dir on any path in the system. However, users cannot share their existing media collections downloaded with other bittorrent clients. To do so, they must move their entire collections to the Tribler Downloads folder and then add the corresponding .torrent files to Tribler as downloads. That is very unintuitive. Besides, many users on Windows use more then one drive to store media (films, music), while their Tribler user directory resides on drive C:. This forces them to copy or reorganize their files, which is unacceptable for many. In addition, we should discuss creating a procedure to map files to torrents, so users can just scan their entire media collection and immediately join the corresponding swarms (possible student project?).

ichorid commented 5 years ago

A user complaining about this problem: https://forum.tribler.org/t/exact-steps-to-start-seeding-from-torrent-files-i-already-have/5054

zimio commented 5 years ago

This sounds interesting. Can I give it a go at solving this issue? Is more discussion needed?

ichorid commented 5 years ago

@zimio , thanks for your interest in Tribler project! Your contribution would be very welcome.

At this moment Tribler is capable of seeding files outside of Tribler Downloads folder. It does that by default, i.e. a user does not have to move anything if they just want to share their files. However, when someone wants to share the torrent collection they already have to join the existing swarms, there is a problem:

If the user adds a .torrent file, Tribler rightfully recognizes this action as a new download. It is correct and expected behavior since Tribler knows nothing about where the user keeps their files for this torrent;
If the user adds the files from some torrent, Tribler recognizes this action as a creation of a new torrent, which is again, completely right and expected, since Tribler does not know anything about the existence of the corresponding .torrent file.

@zimio , I suggest you download a couple of torrents with some other client and then trying to share them with Tribler, to get in touch with the problem. After that, could you please suggest your thoughts/solution to this?

zimio commented 5 years ago

Hello thanks. I've been thinking about this.

So we have two cases. Adding a new torrent file and adding new files.

1-) Adding new torrent file.

Since the problem is that we don't know the location of the files already downloaded by the torrent and that information is not available in the torrent file itself. Then, we should ask the user to give us a directory where we can scan for files. Could be analogous to the share directory in other P2P clients.

So Tribler goes and finds files with the same hashes as the ones in the torrent file added.

If a full match is found (all files), then we start seeding the torrent.

If a partial match is found (only some files), then we start seeding only those parts we already have and ask user if he wants to download the rest.

If no match is found (no files), then it must be a new download.

2-) Adding files.

This is a bit more difficult but I think the right approach would be to hash to files and search for an existing swarm that contains those files. If no swarm is found, then we create a new torrent to seed those files.

How does it sound to you?

ichorid commented 5 years ago

@zimio Idea for case (1) sounds reasonable. We can devise a routine that scans user's drive recursively and builds a file list. Then, it collects all .torrent files out of that list, and for each .torrent tries to find matching dir/files in that list. Then it proceeds to seed those torrents that appear to be fully/mostly complete.

Regarding case (2), the problem is that torrents are identified by infohashes, and one can only search DHT for these. Unfortunately, aside from the content, the infohash of a torrent depends on the torrent's title and the way the torrent was created. This means we cannot easily identify the torrent by its contents.

So, I propose to focus on implementing your solution for the case (1). I advise you to make a fork of current devel and start working "top-down" on the issue. A good first step would be creating an (obviously failing) standalone unit test (with a stable test samples dir, etc.) and start working from it.

zimio commented 5 years ago

Thanks I will get started with approach (1). Just as a follow up to approach (2), I think that maybe we could use the SearchCommunity to look for torrents with more metadata than just the infohash. We could maybe extend it so we could find individual files and merge swarms this way. I'm afraid it is not very scalable but some clients have tried to do swarm merging for download searching for the size of the file.

https://github.com/BiglySoftware/BiglyBT/wiki/Swarm-Merging

https://torrentfreak.com/vuze-speeds-up-torrent-downloads-through-swarm-merging-150320/

Approach two needs more discussion and it is a fairly complex thing to implement. So I will start already with approach (1) and then we can discuss next steps.

ichorid commented 5 years ago

Unfortunately, SearchCommunity will be gone in Tribler 7.3. Recently, Tribler Channels subsystem was completely rewritten from scratch and replaced by GigaChannel subsystem. The new metadata format provides no information about individual files in the torrent (except for the overall torrent size). So, to facilitate swarm merging, we'll have to build some new separate subsystem. I guess we'll figure it out later.

zimio commented 5 years ago

Can I get some feedback here:

https://github.com/Tribler/tribler/pull/4433

ichorid commented 5 years ago

We can look into the TLSH fuzzy hashing algorithm to detect duplicate torrents.

slrslr commented 4 years ago

procedure to map files to torrents, so users can just scan their entire media collection and immediately join the corresponding swarms

in 2018 i submitted similar request on Libtorrent issue tracker: https://github.com/arvidn/libtorrent/issues/2838#issuecomment-608352978

https://en.wikipedia.org/wiki/Everything_%28software%29 finds file by name in one second among 4 million files on my HDDs

Would be nice if user add torrent and it will find and link already existing files no matter how these are named or where are located (if spread in different directories). When you rename the file in OS and after some peer request torrent download and Tribler detect missing file, it would automatically search for the file hash and compare with the file hash index and re-link the file to the torrent. Sorry, i am just dreaming, i am not a developer.

mrcruz commented 4 years ago

This issue is confusing to me. The original post talks mostly about being able to select a different directory to already downloaded torrents, and you can do that already. The only sentence that still applies to the latest version is the one that @slrslr is referring. Shouldn't this be in a separate issue with a better title?

Dmole commented 3 years ago

IMO It's better to leave that sort of feature to the file system; mklink /H old.file new.file or on UNIX like systems ln old.file new.file https://en.wikipedia.org/wiki/Hard_link

Tribler / tribler

Torrents out of Tribler dir #4222