cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

Expand `pollen_data_gen simple` #102

Closed anshumanmohan closed 1 year ago

anshumanmohan commented 1 year ago

This PR expands the simple option of pollen_data_gen as sketched out in #96.

That is, simple now accepts a graph and emits a lengthy legal JSON that has:

  1. No loss, so a roundtrip is possible.
  2. All the info that a depth accelerator will want from the graph.

The depth-specific stuff can just be jq-ed out of the simple output. It matches exine exactly, and so this is what we have in turnt now. Try it with

pollen_data_gen simple test/note5.gfa | jq .depth

To check that this is lossless, run

pollen_data_gen roundtrip test/note5.gfa

Silence is good: it means that an assertion of equality has not been failed.

This code is all kinda idiosyncratic and nasty, and I'd appreciate any thoughts about making it nicer. Docstrings explain the encoding/decoding rules I've made up as I went along.

Closes #96.