cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
27 stars 1 forks source link

Basic bench #66

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

This PR adds an extremely low-res benchmarking suite to slow-odgi.

I ripped out lots of the stuff that was useful for actually ensuring correctness but was a pain in terms of Make and turnt engineering.

  1. validate is just run against valid graphs; I've thrown out the infrastructure where we perturbed the graph to make nice test-cases. This means validate is less of a delicate flower when it comes to setup, oracle, and test: it can just be added to the usual list of oracles and targets.
  2. For inject and overlap I've stashed precomputed input files under slow_odgi/setup/; this means that Make (and thus turnt) does not need to run setup environments for these.
  3. turnt's use is now purely cursory. I just use it to line up all my stuff and orchestrate it. All the outputs are sent to /dev/null, and all the oracle files (e.g. k.depth, k.inj) are empty. Turnt's oks are 100% vacuous.

I don't really think this stuff can/should be merged; the changes here are too destructive. In milestone 0.2 I'll do a more careful version that brings back testing and is therefore mergeable.

At the risk of repeating myself: fair warning, there's no testing going on over here!! If someone wanted to speed slow-odgi up using an automated tool that guarantees the preservation of program semantics, that is fine because the code as written was tested relatively rigorously before I ripped the testing infrastructure out. In order to make more thoughtful changes that really get into the slow-odgi code, please wait for me to improve this stuff in milestone 0.2.

Here are the numbers I got:

time make fetch 
# grabs .gfa files from odgi's repository. 

real    0m1.149s
user    0m0.228s
sys 0m0.095s

time make og    
# runs `odgi build` on each .gfa file, creating .og files.

real    0m0.571s
user    0m0.570s
sys     0m0.065s

time make slow-odgi-all-oracles 
# runs each odgi command against each .og file.

real    0m8.376s
user    0m13.965s
sys     0m1.462s

time make slow-odgi-all-tests   
# runs each slow-odgi command against each .gfa file.

real    10m7.876s
user    9m35.973s
sys     0m29.469s
sampsyo commented 1 year ago
$ time make slow-odgi-all-oracles
real    0m8.376s
$ time make slow-odgi-all-tests
real    10m7.876s

8 seconds versus 10 minutes! slow-odgi truly is living up to its name, isn't it? 😂

Anyway, awesome to see that it's not enormously difficult to get some basic timing numbers. To facilitate easy/reproducible benchmarking in the same setup as testing, maybe we want to think creatively about what features we'd add to Turnt to collect that data for us.

anshumanmohan commented 1 year ago

Oh yes yes, if Turnt were interesting in getting into the benchmarking business, I'd be super keen to brainstorm on that. Here I was, crying myself to sleep, afraid I'd have to leave its warm embrace...

anshumanmohan commented 1 year ago

I have brought testing back!

  1. I have carved out the fussier but more thoughtful tests (e.g. making interesting GFAs for validate via link-dropping, handmade files...) into a separate Make target called test-slow-odgi-careful. The benchmarking is against the clean, basic suite. To test with full care, run make test-slow-odgi-careful. To test the basic suite, run make test-slow-odgi.
  2. In earlier commits of this branch, I was piping outputs to /dev/null, thus avoiding a lot of the overhead of printing to screen. I have scrapped that, and am printing to screen again. That hack, at least the way I did it, was too destructive. In a future version, I'll use Python's subprocess to orchestrate my benchmarking, and then stdout=subprocess.DEVNULL will achieve this in a cleaner way.

Here's how to tun the benchmark, and the results I see for now:

time make fetch 
real    0m0.610s
user    0m0.200s
sys     0m0.078s

time make og
real    0m0.475s
user    0m0.468s
sys     0m0.059s

time make slow-odgi-all-oracles
real    0m27.687s
user    0m35.349s
sys     0m3.729s

time make slow-odgi-all-tests
real    7m26.518s
user    7m7.473s
sys     0m19.073s