A bunch of little optimizations guided by some profiling, all for the parsing part of polbin.
I used two human pangenome GFAs to measure stuff. Measured on havarti (reporting times to convert GFA -> FlatGFA):
chr22
chr8
original GFA size
2.4 GB
3.9 GB
FlatGA size
1.5 GB
2.1 GB
before time
28s
49s
after time
13s
18s
So that's a 2.2x and 2.7x speedup for the two input graphs, respectively.
Optimizations included:
Getting rid of some collects to avoid allocating vectors.
Replacing usize IDs with u32 IDs.
The big one: optimizing for the (apparently common) case when segment names are sequential numbers, avoiding a hash table that was previously required to look up IDs by name.
Next steps would be:
Roll my own (regex-free) GFA parser.
Avoid the memcpy stage by pre-allocating big slabs of memory and parsing directly into there. Requires estimating the sizes of things, which seems hard?
Something about how weirdly large the "path steps" parser looms in the time profile??
A bunch of little optimizations guided by some profiling, all for the parsing part of polbin.
I used two human pangenome GFAs to measure stuff. Measured on havarti (reporting times to convert GFA -> FlatGFA):
So that's a 2.2x and 2.7x speedup for the two input graphs, respectively.
Optimizations included:
collect
s to avoid allocating vectors.usize
IDs withu32
IDs.Next steps would be: