Sparrow0hawk / sagitta

🦀 A Rust SGE qacct clone
MIT License
1 stars 0 forks source link

Can we read from the bottom of the file to improve speed? #4

Closed Sparrow0hawk closed 1 year ago

Sparrow0hawk commented 1 year ago

If we read from bottom to top we'd likely get job IDs faster as new jobs IDs are added to the bottom and most jobs we're interested in are recent.

rev_buf_reader crate seems ideal for this and is implemented in rev-search branch.

However it appears a lot slower when running on ARC.

Sparrow0hawk commented 1 year ago

Benchmarking against qacct using hyperfine.

rev-search branch version:

hyperfine 'sagitta -j 4541659 accounting'
 'qacct -j 4541659'
Benchmark 1: sagitta -j 4541659 accounting
  Time (mean ± σ):     76.379 s ±  2.263 s    [User: 20.229 s, System: 8.579 s]
  Range (min … max):   72.777 s … 79.709 s    10 runs

Benchmark 2: qacct -j 4541659
  Time (mean ± σ):      5.859 s ±  0.818 s    [User: 4.309 s, System: 1.531 s]
  Range (min … max):    5.070 s …  7.344 s    10 runs

Summary
  'qacct -j 4541659' ran
   13.04 ± 1.86 times faster than 'sagitta -j 4541659 accounting'
Sparrow0hawk commented 1 year ago

Benchmarking against qacct using hyperfine.

main branch version:

 hyperfine 'sagitta -j 4541659 accounting' 'qacct -j 4541659'
Benchmark 1: sagitta -j 4541659 accounting
  Time (mean ± σ):      5.571 s ±  0.965 s    [User: 4.009 s, System: 1.546 s]
  Range (min … max):    4.862 s …  7.497 s    10 runs

Benchmark 2: qacct -j 4541659
  Time (mean ± σ):      5.738 s ±  0.774 s    [User: 4.216 s, System: 1.504 s]
  Range (min … max):    5.011 s …  7.181 s    10 runs

Summary
  'sagitta -j 4541659 accounting' ran
    1.03 ± 0.23 times faster than 'qacct -j 4541659'
Sparrow0hawk commented 1 year ago

The target file is 4.8G with 13,429,283 lines.

Sparrow0hawk commented 1 year ago

Flamegraph for main branch

sagitta-readbuf

Sparrow0hawk commented 1 year ago

Flamegraph for rev-search branch.

sagitta-Revreadbuf

Sparrow0hawk commented 1 year ago

These issues here were a bit of a red herring.

My issue was more I was called collect at the end of find_line which meant iterating through the entire file. Switching this to .find in got the more desired behaviour of stopping the iterator when the ID was found.

Sparrow0hawk commented 1 year ago
hyperfine 'sagitta -j 4541659 accounting' 'qacct -j 4541659'
Benchmark 1: sagitta -j 4541659 accounting
  Time (mean ± σ):     233.3 ms ±  47.7 ms    [User: 206.5 ms, System: 25.2 ms]
  Range (min … max):   183.7 ms … 290.6 ms    10 runs

Benchmark 2: qacct -j 4541659
  Time (mean ± σ):      5.778 s ±  0.900 s    [User: 4.087 s, System: 1.671 s]
  Range (min … max):    4.818 s …  6.933 s    10 runs

Summary
  'sagitta -j 4541659 accounting' ran
   24.76 ± 6.36 times faster than 'qacct -j 4541659'