a-detiste / cruft-ng

rewrite of Debian "cruft" package
GNU General Public License v2.0
24 stars 5 forks source link

cruft: parallelize globs matching #13

Open cgzones opened 2 years ago

cgzones commented 2 years ago

The most expensive operation of cruft is the filtering of the existing paths with the configured filters and excludes. Perform that task in parallel by using C++17 parallel algorithms. One downside is, since the standard library does not implement this functionality itself, this requires linking against tbb (Threading Building Blocks).

Sequenced:

Benchmark 1: ./cruft -E ./explain/ -F ./rules/ -I ./ignore -R ./ruleset
  Time (mean ± σ):      7.179 s ±  0.079 s    [User: 6.366 s, System: 2.519 s]
  Range (min … max):    7.060 s …  7.291 s    10 runs

Parallel:

Benchmark 1: ./cruft -E ./explain/ -F ./rules/ -I ./ignore -R ./ruleset
  Time (mean ± σ):      4.762 s ±  0.058 s    [User: 6.659 s, System: 2.414 s]
  Range (min … max):    4.682 s …  4.849 s    10 runs
a-detiste commented 1 year ago

Can you rebase ? Looks good.

a-detiste commented 1 year ago

thanks

a-detiste commented 1 year ago

I need to test this on Buster

cgzones commented 1 year ago

While rerunning the benchmark I did not notice a significant time saving anymore. Maybe the simplification of myglob() made this obsolete.

a-detiste commented 1 year ago

It looks like a case of too early optimisation on both sides. It was a big mistake on my side to optimize for a match, I should have instead optimized for a non match (what happens 99% of the time).