bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.05k stars 78 forks source link

Store SHA1 for files when available #167

Closed Send8213 closed 3 months ago

Send8213 commented 3 months ago

Is your feature request related to a problem? Please describe

Some torrent files may contain SHA1 hashes for their files. These hashes could provide hints in the future for deduplicating data or finding other torrents to cross-seed.

Describe the solution you'd like

Store the SHA1 hash for a file when it is present in a torrent.

Describe alternatives you've considered

Using file size to find files that can potentially be deduplicated/cross-seeded instead.

Additional context

http://bittorrent.org/beps/bep_0047.html

sha1 20 bytes. The SHA1 digest calculated over the contents of the file itself, without any additional padding. Can be used to aid file deduplication [2]. The hash should only be considered as a hint, pieces hashes are the canonical reference for integrity checking.

I don't know if these file hashes have actually been used in practice but if they have it could be quite useful for trying to identify files across torrents as an alternative to waiting for Bittorrent V2 support and adoption.

Send8213 commented 3 months ago

It turns out that this hash is less common than I thought.

.4% of torrents collected in a set by 2018. .1% of a much smaller set of torrents from more recently.

I am withdrawing this feature request as I do not believe it is worth it to implement.