cucapra / pollen

generating hardware accelerators for pangenomic graph queries
MIT License
24 stars 1 forks source link

FlatGFA: Optimize GFA parsing a bit #153

Closed sampsyo closed 3 months ago

sampsyo commented 3 months ago

A bunch of little optimizations guided by some profiling, all for the parsing part of polbin.

I used two human pangenome GFAs to measure stuff. Measured on havarti (reporting times to convert GFA -> FlatGFA):

chr22 chr8
original GFA size 2.4 GB 3.9 GB
FlatGA size 1.5 GB 2.1 GB
before time 28s 49s
after time 13s 18s

So that's a 2.2x and 2.7x speedup for the two input graphs, respectively.

Optimizations included:

Next steps would be: