dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
161 stars 11 forks source link

Any preconfigured databases? #88

Open jolespin opened 2 years ago

jolespin commented 2 years ago

I'm looking for a method that can say whether an assembly is prokaryotic or eukaryotic. Was thinking this software might be helpful. Do you have any preconfigured databases that have both prokaryotes and eukaryotes?

dnbaker commented 2 years ago

Hi there!

We don't have any preconfigured databases currently, but that is something we plan to put together for Dashing2 in the near future.

You'd have to download the set of genomes from RefSeq. I have a script which you could use to download them, at which point you could compare your assembly against them.

You could do something like the following:

python3 download_genomes.py all
find ref -name '*fna.gz' > refs.txt
echo $PATH_TO_ASSEMBLY > query.txt
dashing dist -Q query.txt -F refs.txt -k11 -Orefseq.matches -o refseq.sizes -p24

But a pre-built database would be much easier to work with, and you wouldn't need the disk space. I'll let you know when that changes.

Thanks!

Daniel