Context: We have previously tried these ways of reading GFA files: pygfa, gfapy, going through odgi itself and interacting with its API through its Python FFI wrapper, and (at one point, I think) rolling our own ad hoc parser (my memory is hazy but I think @dc854 did this?). All of these have been kind of unsatisfactory in their own way—the Python libraries have broken in mysterious ways, talking to odgi has the ordinary practical challenges of any Python FFI, and our hand-rolled text parser was pretty algorithm-specific and didn't have a general Python data model.
What this is: I tossed together an extremely simple Python parser for GFA files. The goals are:
Be an all-Python thing; no C++ FFI required.
Avoid the morass of complexity of full-scale Python GFA libraries by just supporting what we need.
Understand the abstract data model for GFA/og files by writing the data structure myself.
This seems to work in the sense that it doesn't crash while parsing the GFA files in our current test suite. Here's a quick test:
for fn in `ls test/*.gfa` ; do echo $fn ; python3 mygfa.py < $fn > /dev/null ; done
Most files go fast; chr8.pan also works but takes ~16 minutes on my laptop.
Why you should care: Two reasons:
I humbly suggest that this would be the parser component for the GFA->JSON converter that we've been talking about. It will be more convenient than using the odgi FFI and less confusing than using the existing Python libraries. Or I dunno, maybe we want to throw this away and port it to Rust?
Perhaps, like me, you will find the abstract data model edifying w/r/t what data actually exists in GFA files. The text format is very confusing to read, so maybe the types here will find the types here clearer than trying to parse the GFA2 spec.
Perhaps this needs to go into the pollen package; for now it's at the top level just to show you what it looks like.
Context: We have previously tried these ways of reading GFA files: pygfa, gfapy, going through odgi itself and interacting with its API through its Python FFI wrapper, and (at one point, I think) rolling our own ad hoc parser (my memory is hazy but I think @dc854 did this?). All of these have been kind of unsatisfactory in their own way—the Python libraries have broken in mysterious ways, talking to odgi has the ordinary practical challenges of any Python FFI, and our hand-rolled text parser was pretty algorithm-specific and didn't have a general Python data model.
What this is: I tossed together an extremely simple Python parser for GFA files. The goals are:
This seems to work in the sense that it doesn't crash while parsing the GFA files in our current test suite. Here's a quick test:
Most files go fast;
chr8.pan
also works but takes ~16 minutes on my laptop.Why you should care: Two reasons:
Perhaps this needs to go into the
pollen
package; for now it's at the top level just to show you what it looks like.