bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.44k stars 99 forks source link

Searching medium-large (~2.5M) torrent database takes a very long time #332

Closed Max-Pare closed 1 week ago

Max-Pare commented 1 week ago

Describe the bug

Searching the database for torrents is extremely slow

To Reproduce

Expected behavior

A text search should not take anywhere near 5+ minutes even on slow hardware

Notes

CPU usage while searching does not go above 5% disk reads while searching does not go above ~100kbps

Environment Information (Required)

other

logs don't show any error or anything other than general info

DerBunteBall commented 1 week ago

Which hard disk type is used?

Max-Pare commented 1 week ago

4TB HHD

DerBunteBall commented 1 week ago

Consider the following:

  1. even if Bitmagnet has not activated any of the very I/O intensive options (Pieces and Filelists), HDDs are generally bitchy.
  2. searches will always take a long time when a large reprocess or index is running.
  3. searches can also run slowly on HDDs if Bitmagnet has activated high parallelization. Unfortunately, HDDs generally have the problem that they exhibit a behavior that I would call “counter-rotating I/O”. HDDs jump around a lot. Since Bitmagnet constantly feeds data into the DB, complex read operations take a very long time to collect. This gets worse the more parallelism there is or the larger the input to the database becomes. You can see whether a large operation is running in the metrics - in the latest version probably also in the UI.
Max-Pare commented 1 week ago

alright that's good to know, thanks for the info.

DerBunteBall commented 1 week ago

It should be quite acceptable in normal operation.

This is the case if Bitmagnet does not process large quantities of jobs and only collects on DHT. I have search times of about 20-50 seconds for large databases (about 10 times larger than yours). It hardly gets any better with HDDs. NVMe SSDs are best suited as storage. Or at least normal SATA SSDs. This takes out the mechanics that really eat up a lot.

Max-Pare commented 1 week ago

It should be quite acceptable in normal operation.

This is the case if Bitmagnet does not process large quantities of jobs and only collects on DHT. I have search times of about 20-50 seconds for large databases (about 10 times larger than yours). It hardly gets any better with HDDs. NVMe SSDs are best suited as storage. Or at least normal SATA SSDs. This takes out the mechanics that really eat up a lot.

I moved my database to a sata SSD, but I'll have to start indexing from scratch (for reasons unrelated to bitmagnet), if the issue persists then I'll let you know.

mgdigital commented 1 week ago

Thanks for answering this @DerBunteBall