bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.51k stars 102 forks source link

Ability to store metaInfo files #302

Open LordMike opened 3 months ago

LordMike commented 3 months ago

Is your feature request related to a problem? Please describe

I have a project that acquires torrent metafiles en-mass, and I've been on the lookout for a DHT Crawler. I'm trying out bitmagnet, but having reviewed the database scheme, it doesn't seem like bitmagnet keeps the torrent metadata files on hand after it's been ingested. It'd be cool to have the option to persist the original metadata files either on disk or in the postgres database.

Describe the solution you'd like

An option that could be enabled, to persist metainfo files, possibly on disk (e.g. in a tree structure like 00/11/22/001122.....torrent.

Once a new torrent is identified and it's stored to the DB (at which point I assume it's "new" and doesn't exist on disk), the binary blob that is the torrent file could additionally be saved to disk.

Describe alternatives you've considered

N/A

Additional context

N/A

DerBunteBall commented 3 months ago

That's available by the save_pieces option of the DHT Crawler.

Valid torrents can be assambled from the DB. Only in a few special situations you can only produce "dummy torrents".

LordMike commented 3 months ago

It seems that option will save the pieces binary blob, which is only part of the info dictionary. In order to recreate the info dictionary and thus the original torrent I assume more information might be needed.

As an example, individual files may have hashes or other arbitrary content associated, which it didn’t seem like bitmagnet stored either.

Thanks for the quick reply :)

Mike.

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: DerBunteBall @.> Sent: Wednesday, August 14, 2024 11:23:42 PM To: bitmagnet-io/bitmagnet @.> Cc: Michael Bisbjerg @.>; Author @.> Subject: Re: [bitmagnet-io/bitmagnet] Ability to store metaInfo files (Issue #302)

That's available by the save_pieces option of the DHT Crawler.

Valid torrents can be assambled from the DB. Only in a few special situations you can only produce "dummy torrents".

— Reply to this email directly, view it on GitHubhttps://github.com/bitmagnet-io/bitmagnet/issues/302#issuecomment-2289929913, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAH2YJ6WFCHFCQYGEC3PLETZRPDF5AVCNFSM6AAAAABMREGZBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZHEZDSOJRGM. You are receiving this because you authored the thread.Message ID: @.***>

DerBunteBall commented 3 months ago

Bitmagnet stores all needed information unless there are not really specific cases.

https://en.wikipedia.org/wiki/Torrent_file#File_struct

Every needed information is stored in the database, when you store the pieces. The filelist with length and path and so on is stored. You can write a simple Python script that's able to generate the Torrent e.g. with Torf which has a valid dict hash.

This only doesn't work in cases where clients did more or less not specified stuff in the info dict part which is hashed. That's only in situation where optional md5sums are stored or .utf-8 keys are stored. In that cases the generated torrent wouldn't have the correct info dict hash (torrent checksum). But these cases are relatively rare. In this case you only can generate dummy torrents e.g. for checking data validity on disk.

LordMike commented 3 months ago

And that’s this feature request - an option to store the meta files as they were, on disk (fex) under their info hash. :)

I know it’s a niche feature - I was partly hoping to expand on my torrent file project which hoards torrent files.

Mike.

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: DerBunteBall @.> Sent: Wednesday, August 14, 2024 11:36:35 PM To: bitmagnet-io/bitmagnet @.> Cc: Michael Bisbjerg @.>; Author @.> Subject: Re: [bitmagnet-io/bitmagnet] Ability to store metaInfo files (Issue #302)

Bitmagnet stores all needed information unless there are not really specific cases.

https://en.wikipedia.org/wiki/Torrent_file#File_struct

Every needed information is stored in the database, when you store the pieces. The filelist with length and path and so on is stored. You can write a simple Python script that's able to generate the Torrent e.g. with Torf which has a valid dict hash.

This only doesn't work in cases where clients did more or less not specified stuff in the info dict part which is hashed. That's only in situation where optional md5sums are stored or .utf-8 keys are stored. In that cases the generated torrent wouldn't have the correct info dict hash (torrent checksum). But these cases are relatively rare. In this case you only can generate dummy torrents e.g. for checking data validity on disk.

— Reply to this email directly, view it on GitHubhttps://github.com/bitmagnet-io/bitmagnet/issues/302#issuecomment-2289951143, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAH2YJY7X7ELDEUCUXKYL33ZRPEWHAVCNFSM6AAAAABMREGZBSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBZHE2TCMJUGM. You are receiving this because you authored the thread.Message ID: @.***>

LordMike commented 3 months ago

Note: I implemented the feature here. If it has any interest at all, I can make a PR back.

leofidus commented 3 months ago

Another advantage of storing the metainfo as files is that it removes load and storage requirement from the postgres database when compared to save_pieces. For example I have my bigmagnet postgres data on an SSD, but would prefer storing .torrent files on cheap spinning rust.

I know I can technically already do that with table spaces, but dumping it as files in a directory structure would be more user-friendly and make it easier to integrate with other software

Dobatymo commented 2 weeks ago

@DerBunteBall

.utf-8 keys

Actually these are quite common in the Chinese community in my experience. So losing all these torrents is bad. Torrents with these keys are also currently discarded due to the utf8 check on the normal keys, which usually fails on torrents with these extra keys.

So interpreting these keys when they exist would be preferred imo.