BurntSushi / suffix

Fast suffix arrays for Rust (with Unicode support).
The Unlicense
263 stars 30 forks source link

Performances #8

Closed delehef closed 4 years ago

delehef commented 6 years ago

Hi,

I needed a high performance suffix array lib some time ago for a bioinformatics project, and I ended up working with a crude FFI with divsufsort.

Just so you know, its performances while creating the array is blowing suffix out of the water. So feel free to take a glance if it interests you or to close this issue otherwise ;)

BurntSushi commented 6 years ago

Just so you know, its performances while creating the array is blowing suffix out of the water.

Can you please show your benchmarks? Otherwise, I'm not really sure what to do with this issue.

delehef commented 6 years ago

My issue was a bit off the mill and I apologize if I went off as confrontational.

Here is a toy repo with a minimal example; you just have to gunzip the msy.fasta.gz file before either cargo bench- or cargo run-ing. I advise you to run it normally first, the benchmark mode is awfully long in this case.

With my computer, I get these results:

18 matches
PT3.080668480S seconds for divsufsort.
18 matches
PT9.860067663S seconds for suffix.

If you would like to try an example with larger files, you can download this one ftp://ftp.ensemblgenomes.org/pub/plants/release-40/fasta/arabidopsis_thaliana/dna/Arabidopsis_thaliana.TAIR10.dna.toplevel.fa.gz for instance, and just adapt the filename in main.rs, line 11.

BurntSushi commented 6 years ago

Does divsufsort support the same stuff as this crate? Does it handle Unicode correctly?

delehef commented 6 years ago

Neither; it is much more crude and it only works when looking for a bytes needle in a bytes array. I just thought you might be interesting by their algorithm.

BurntSushi commented 6 years ago

Thanks! I'll take a look at this when I get a chance.

delehef commented 6 years ago

You're most welcome! On an aside, thanks for everything you're doing in the Rust community.

BurntSushi commented 4 years ago

I'm going to close this out as stale. If I ever get motivated enough to dive back into this space, then I'll investigate performance more closely, but I'm otherwise happy with where this crate is currently at.