JohnDoee / autotorrent2

Cross-seed matching and torrent lifecycle tool
https://johndoee.github.io/autotorrent2/
MIT License
151 stars 12 forks source link

Improve scan performance and correctness with a prefix tree #41

Closed kannibalox closed 7 months ago

kannibalox commented 1 year ago

This speeds up scanning by building an in-memory prefix tree, then generating a single iterable that uses executemany to insert all the rows in a single transaction. The results aren't too noticeable on smaller sets, but makes larger sets dramatically faster.

A side effect of this is that entries that may have previously been missed under an unsplittable root are now correctly marked as such.

For some rough performance testing, I used two real file sets: a small one (18k files) and large one (500k), and measured the wall time of the run and the max RSS usage via time -v. The find command mentioned below is find <directory> -depth -type f -printf %s:%p\\n>/dev/null, to provide a reference for the "ideal" baseline. All scans were run three times and only the best values were picked, to account for caching.

branch time (small) time (large) max RSS (small) max RSS (large)
find 0m 0.06s 0m 1.85s N/A N/A
master 0m 1.30s 7m 45.35s 151 MiB 1217 MiB
scan-performance 0m 1.15s 0m 30.29s 158 MiB 741 MiB
newadventure079 commented 1 year ago

@JohnDoee Can we get this merged soon?

awinnpii commented 9 months ago

@kannibalox I've implemented this and #45 locally and it's working great!