cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

Btmg redux #59

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

Incorporating the changes that were suggested in #50 after I closed it too early. Plus two other things.

Makefile unzipping

Earlier I had a Makefile like:

test-slow-odgi: test-slow-chop test-slow-inject ...

test-slow-chop: og
  turnt --save ... chop_oracle
  turnt ... chop_test

test-slow-inject: og
  turnt --save ... inject_setup
  turnt --save ... inject_oracle
  turnt ... inject_test

...

Now I have a new style:

test-slow-odgi: slow-odgi-all-tests

slow-odgi-all-oracles
  turnt --save ... chop_oracle
  turnt --save ... inject_setup
  turnt --save ... inject_oracle
  ...

slow-odgi-all-tests: slow-odgi-all-oracles
  turnt ... chop_test
  turnt ... inject_test
  ...

That is, I've "unzipped", for each algorithm, (1) the setup/oracle and (2) the test. This is nice because we can generate the oracles once, noisily (it's noisy because they are all --save commands and turnt warns us with not ok ... # missing each time). This basically gets us into expect-testing land. We can now run the slow-odgi-all-tests target repeatedly against the same expect files. Any not oks here are meaningful.

One problem I'm having here is that make is not behaving as I understand it should. I would have thought that, if I had

slow-odgi-all-tests: slow-odgi-all-oracles

and I made all-tests repeatedly without touching the oracles, the all-oracles script would not be run after the first time: the dependency would be satisfied for free based on which files were newer. This does not seem to be happening; the all-oracles stuff is run every time.

No more load-bearing strings in flip

Important background: odgi really does add "_inv" to the name of a path if it flips it. That bit wasn't made up by me...

My previous system was:

  1. For each path, check if it needs to be flipped. If it does, add "_inv" to its name and flip its segments. The dict containing these paths, whether flipped just now or not, is the new paths.
  2. For each path in the dict that has "_inv" in its name, generate new links that make this path valid.

My new system is:

  1. For each path, check if it needs to be flipped. If it does, add "_inv" to its name and flip its segments. The dict of these paths, whether flipped just now or not, plus a bool saying whether we just flipped it, is paths_dec. The dec is for "decorated".
  2. For each path in the dict that is decorated with an "I was just flipped" bool, generate new links that make this path valid.
  3. Strip paths_dec of its bools to create the new paths, which is now as before.

The point is that, earlier, an input graph that already had "_inv" in a path name would have led to link-generation, regardless of whether the path required flipping and therefore link-generation. Indeed, I think iterated calls to flip would have been unnecessarily complex (and eventually, after dedup, no-ops) instead of just being simple no-ops. I think the new style fixes that.

sampsyo commented 1 year ago

I've got a more complete review in my queue, but for now:

I would have thought that, if I had

slow-odgi-all-tests: slow-odgi-all-oracles

and I made all-tests repeatedly without touching the oracles, the all-oracles script would not be run after the first time: the dependency would be satisfied for free based on which files were newer. This does not seem to be happening; the all-oracles stuff is run every time.

Classic Make only does this "redo work only if something has changed" based on files. That is, it knows it needs to rebuild foo.o based on foo.c if the filesystem modification time for the foo.c file is newer than the foo.o file. So for "phony" targets like test-* that don't correspond to files, Make just assumes everything needs to be done every time.