bbuchfink / diamond

Accelerated BLAST compatible local sequence aligner.
GNU General Public License v3.0
1.05k stars 182 forks source link

Error: taxonomy, diamond #638 #642

Open joseailtoncruz opened 1 year ago

joseailtoncruz commented 1 year ago

I am using diamond, when executing the command line <./diamond blastx -d reference.dmnd -q queries.fasta -o teste --outfmt 5> I can run the program normally, but I would like to obtain the taxonomy, follow the instructions of the manual but an error message always appears (diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt** Check http://github.com/bbuchfink/diamond for updates.

CPU threads: 12

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)

Target sequences to report alignments for: 25

Temporary directory: Opening the database... [0.000482s] Loading taxonomy... No such file or directory [0.000119s] Error: Error opening file prot.accession2taxid.FULL.gz) the command line I used was the following <./diamond blastx -d reference.dmnd -q queries.fasta -o matches --taxonmap prot.accession2taxid.FULL.gz --outfmt 5>, the file used in --taxonmap was downloaded from ncbi (https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/), that is, following the instructions "Path to the output DIAMOND database file.

--taxonmap

Path to mapping file that maps NCBI protein accession numbers to taxon ids (gzip compressed). This parameter is optional and needs to be supplied in order to provide taxonomy features. The file can be downloaded from NCBI: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz

Versions older than v2.0.7 only support the reduced mapping file: ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz

A custom file following the same format may be supplied here. Note that the first line of this file is assumed to contain headings and will be ignored.**". I would like to know how I can obtain the taxonomy of the isolates I am analyzing.

Thank you in advance for your help, and I apologize for my English.

bbuchfink commented 1 year ago

It means the file does not exist. Make sure that prot.accession2taxid.FULL.gz is in the same directory.

joseailtoncruz commented 1 year ago

Dear, I already tried to download the latest version of Diamond (https://github.com/bbuchfink/diamondv2.1.0) and perform the following commands; <~$ diamond makedb --in refseque_proteina_2022.fasta.gz -d reference1 --taxonmap taxdmp.zip> to create the database, which generated a file , then ran the command <~$ diamond blastx -d reference1.dmnd -q contigs.fasta -o result1.xml --outfmt 5> which created a file, this result1.xml file had my blastx results, but unfortunately it didn't have one taxonomy data (the blastx results I am viewing in Geneious 11). I also tried the following command <~$ diamond blastx -d reference1.dmnd -q contigs.fasta -o result1.XML --outfmt 5 --taxonmap taxdmp.zip> which I returned an error **<~$ diamond blastx -d reference1.dmnd -q contigs.fasta -o result1.xml --outfmt 5 --taxonmap taxdmp.zip diamond v0.9.14.115 | by Benjamin Buchfink buchfink@gmail.com Licensed under the GNU AGPL https://www.gnu.org/licenses/agpl.txt Check http://github.com/bbuchfink/diamond for updates.

CPU threads: 12

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)

Target sequences to report alignments for: 25

Temporary directory: Opening the database... No such file or directory [0.000105s> Error: Error opening file reference1.dmnd. Remembering that this file "reference1.dmnd" was generated with the command <~$ diamond makedb --in refseque_proteina_2022.fasta.gz -d reference1 --taxonmap taxdmp.zip>**.

Could you tell me a command line that works to show the taxonomy of viral species?

Thank you in advance for your help and I apologize for my English.

bbuchfink commented 1 year ago

It means the file reference1.dmnd is not found in the current directory.