cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

Proofs about our commands #85

Open anshumanmohan opened 1 year ago

anshumanmohan commented 1 year ago

What if we could prove that Pollen DSL programs preserve the semantics of variation graphs? That is, some but not all odgi commands are "logical no-ops," in the sense that they may split up segments and rename things and stuff, but the set of nucleotide sequences they represent (i.e., odgi flatten would still produce the same flat sequences). Could we somehow prove that these programs are no-ops, so they're free of "bugs"?

Originally posted by @sampsyo in https://github.com/cucapra/pollen/pull/84#pullrequestreview-1417108141

I love this idea!

I'd love to be able to state and prove, for example:

forall g, 
valid g -> 
exists g', flip g g' /\ valid g' /\ meaning g == meaning g'

where

  1. flip a b is a relation, in this case a function, that says that b is the flipped version of a. i.e, all the good stuff stated here.
  2. meaning x is some beautiful encapsulation of the semantics of a graph x, rising above all the kludge of segment-splits and names.
anshumanmohan commented 1 year ago

Bit of progress on this, still in slow_odgi for now.

I have done up paths_logically_le. I take a path-forward approach and assert, in the relevant commands (just chop and inject for now) that the graph we pass is "less than or equal to" the graph we get back. That is, every path that I used to have in the input graph still exists, with the same name and sequence, in the new graph.

This issue talks about logical equality, but I've come up with this logical <= approach because I think there's some value there. We can state == using <= and antisymmetry.