FlatGFA: Optimize GFA parsing a bit

A bunch of little optimizations guided by some profiling, all for the parsing part of polbin.

I used two human pangenome GFAs to measure stuff. Measured on havarti (reporting times to convert GFA -> FlatGFA):

So that's a 2.2x and 2.7x speedup for the two input graphs, respectively.

Optimizations included:

Getting rid of some collects to avoid allocating vectors.
Replacing usize IDs with u32 IDs.
The big one: optimizing for the (apparently common) case when segment names are sequential numbers, avoiding a hash table that was previously required to look up IDs by name.

Next steps would be:

Roll my own (regex-free) GFA parser.
Avoid the memcpy stage by pre-allocating big slabs of memory and parsing directly into there. Requires estimating the sizes of things, which seems hard?
Something about how weirdly large the "path steps" parser looms in the time profile??

cucapra / pollen