bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.06k stars 182 forks source link

Formatted NR database #790

Open stephen-14 opened 8 months ago

stephen-14 commented 8 months ago

Hello everyone, sorry for my silly trouble but could any one help me to get Diamond makedb Database? I'm trying to use diamond to search against NR database to get taxonomic information. I want to get TaxIDs. However, my own PC capability is small 256GB while the database is large and it's always interrupted during Makedb due to no space left on divide. I've tried to extend swapfile.., but it seems not work. I use this command: "diamond makedb --in nr.gz --db nr_diamond.dmnd --taxonmap prot.accession2taxid.gz --taxonnodes nodes.dmp --taxonnames names.dmp" Could you help me share the formatted database of NR or where I could download it. Many Thanks!.

bbuchfink commented 8 months ago

It's not available for download somewhere. You need a system with a larger hard drive.

katievigil commented 6 months ago

Hi @bbuchfink I am updating my diamond database with the current NCBI viral protein Ref seq. I only created the database once and I was wondering if this is the correct way to do it to get the full taxonomic lineage (including viral family). I know you added taxonomic family to the current updated version of Diamond, I was just wondering if this is included in names.dmp or other .dmp files?

Here is how I am creating my database:

diamond makedb --in /work/kvigil/diamond/ -d viralprotein.050124 --taxonmap /work/kvigil/diamond/prot.accession2taxid.FULL.gz --taxonnames /work/kvigil/diamond/names.dmp --taxonnodes /work/kvigil/diamond/nodes.dmp

I am not sure if I need the fullnamelineage.dmp or not? Thanks for your help and thank you for adding the family taxonomy!