ParBLiSS / FastANI

Fast Whole-Genome Similarity (ANI) Estimation
Apache License 2.0
368 stars 66 forks source link

Minimizers or index saved for each genome? #27

Open LongTianPy opened 5 years ago

LongTianPy commented 5 years ago

Hi sir,

While I was querying one genome against ~2000 genome, it was very slow. I checked back the paper on 90k prokaryotic genomes and found indexing would take the majority of the runtime, so I wonder if minimizer of each genome can be saved (like sketch or signature of MinHash) and doesn't have to be recreated every time?

cjain7 commented 5 years ago

Hi, currently we don't support saving the index on disk, but it's in my TODO list. BTW, hope you are using multi-threaded execution. But, no doubt that indexing would take majority of the time.

LongTianPy commented 5 years ago

Thanks for the reply. But I was looking for saving index of each reference genome as a physical file.

bsiranosian commented 4 years ago

I support this feature in the future! Would love a full index of refseq for easy querying...

MrOlm commented 3 years ago

Came here to say I would use this feature as well!

bsiranosian commented 3 years ago

Maybe one day... !

jlumpe commented 2 years ago

Did this ever end up getting implemented?

cjain7 commented 2 years ago

No Sorry I don't have cycles for this.

jorondo1 commented 7 months ago

watching this thread, having several hundreds MAGs to compare to ~30k genomes, saving the reference index would save an unimaginable amount of time. Cheers !