cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
27 stars 1 forks source link

`slow_odgi`: proofs #87

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

This PR is a modest step towards #85.

We'll add further checks as we think of them, but for now I have paths_logically_le. I take a path-forward approach and ask, in the relevant commands (just chop and inject for now) that the graph we pass is "less than or equal to" the graph we get back. That is, every path that I used to have in the input graph still exists, with the same name and sequence, in the new graph.

flip does satisfy this condition spiritually, but does not satisfy it as written. The reason is silly: flip adds the string "_inv" to the names of the paths that it flips. We do this because odgi does. If we are okay with diverging from odgi in this regard, I can remove this addition of "_inv".

85 talks about logical equality, but I've come up with this logical <= approach because I think there's some value there. We can state == using <= and antisymmetry.

anshumanmohan commented 1 year ago

Terribly sorry to have added other PRs onto this one!

The story is that I added a tiny CI thing to teach myself about it, but didn't realize that large chunks of our existing code would fail the CI tests. When I then created this small PR, all those failures came up. The failing CI made me panic needlessly and I decided to fix those issues right here, but I should have gone about it via a separate PR.

To clarify, this PR now has three things in it:

  1. The proof stuff, as described above.
  2. Formatting changes all over pollen-py/, to make Black happy.
  3. A rename of pollen-py/ into pollen_py/, to my MyPy happy.

The last two are not deep, but they have big diffs. Thankfully, AFAIK, no-one else is currently working on those files. If you are, and if my changes have made life hard for you, please let me know. I'll revert this stuff and make it right!

anshumanmohan commented 1 year ago

I'll note that #89 also got rid of a lot of GFA and temp files that IMO should never have been committed! Please let me know if I made a mistake!

sampsyo commented 1 year ago

Super cool! Defining a notion of graph equivalence (or refinement) seems to be the first step toward checking program correctness. That is, we decide something like paths_logically_le is the right predicate, we can pick the theorem we want to prove for "no-op" operations g as something like: for g, paths_logically_le(g, f(g)).

About the names and the _inv suffix: it's not clear to me whether we'd want to prove something in the presence of renaming, or change the predicate to ignore names (i.e., to just require that there are the same set of paths in each). Interesting question! But maybe not too hard to manage under either choice.

And no worries about the PR noise; I just looked at the one new proofs.py file.