bitmagnet-io / bitmagnet

A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration.
https://bitmagnet.io/
MIT License
2.07k stars 80 forks source link

Use budgeted count for aggregations #128

Closed mgdigital closed 4 months ago

mgdigital commented 4 months ago

This PR introduces a significant optimisation of aggregations (counts).

It takes advantage of the fact that a Postgres query plan can tell you the cost of a query up-front, along with a rough estimate of the count based on indexes. All count queries now have a "budget", defaulting to 5,000. If the budget is exceeded according to the query plan, then the estimate will be returned (and the UI will display an estimate symbol ~ next to the associated count), otherwise the query will be executed and an exact count will be returned.

The accuracy of the estimate seems to be within 10-20% of the exact count in most cases - though accuracy depends on selected filter criteria and what is being counted, I've noticed bigger discrepancies but overall it seems like an acceptable trade-off.

The background cache warmer has been removed and aggregations are now real time again (the cache warmer was at best a short term mitigation while I figured out a better solution). The cache TTL has been reduced to 10 minutes. It was previously increased to allow the cache warmer to be run less frequently.

There are also some adjustments to the indexes that improve performance and the accuracy of estimations. For large indexes the migration may take a while to run: in my tests on 12 million torrents it took 15 minutes.