bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.38k stars 94 forks source link

Refactor torrent processing #106

Closed mgdigital closed 8 months ago

mgdigital commented 8 months ago

A rework of the torrent creation workflow: previously a Torrent record was always created with a corresponding TorrentContent record, which would usually be empty; following this a classify_torrent queue job would attempt classification then update the TorrentContent record.

Following this update, Torrent records will always be created in isolation, and a process_torrent job will then run in the queue. This will not only classify the torrent, but also perform any other tasks like search reindexing. For torrents that have already been matched to a piece of content, rematching will not occur (unless specified in the CLI command, see below), which saves a significant amount of work.

A new entity type, TorrentHint has been created for providing hints to the classifier (currently used only by the import tool). Previously any hints for the classifier were added directly to the TorrentContent record, which was a conflation of 2 different things (the classification result, and hints for the classifier), which is problematic when it comes to reclassification.

Additionally, a new CLI command, reprocess has been added, which will reprocess all torrents, classify them, and update the search index. For already matched torrents, rematching will only occur when passing the --rematch flag.

A few reasons for this change: