fulcrumgenomics / fqtk

Fast FASTQ sample demultiplexing in Rust.
MIT License
57 stars 1 forks source link

DRAFT: SortFastq #38

Open kockan opened 1 year ago

kockan commented 1 year ago

Initial attempt at #3 Tested on an empty FASTQ as well as a ~10GB one. It seems to be working.

max-records fgbio fqtk
500000 186.07s user 19.88s system 130% cpu 2:37.46 total 221.10s user 27.20s system 82% cpu 4:59.44 total
1000000 194.21s user 21.99s system 114% cpu 3:08.19 total 226.42s user 26.46s system 84% cpu 4:57.96 total
2000000 196.37s user 13.59s system 170% cpu 2:02.82 total 228.77s user 24.76s system 89% cpu 4:44.37 total
4000000 212.98s user 13.57s system 202% cpu 1:52.13 total 230.88s user 20.35s system 119% cpu 3:29.84 total

Looks like fgbio is consistently faster. Should look into why. Library/sorting algorithm/my usage?

Should add unit tests at minimum + any feedback if dev interest in merging.