Closed jbatovska closed 5 years ago
Hi,
this feature is not available in Diamond at the moment but I could probably implement it within the next week or so. The info you require is contained in the taxdmp.zip file from NCBI. The names.dmp file can be used to map taxon ids to names, so you could also write some simple python script to perform this mapping.
Best regards,
Benjamin
Thanks for the information. If this feature ends up being implemented in Diamond, could you please let me know via this thread? Thank you very much.
Sure, it shouldn't be too long.
Hi Benjamin
Any idea on a time-frame for this feature? I'm about to put together a new pipeline implementing DIAMOND and this would greatly simplify things.
Thanks!
Hi Alexander,
I'll try to get it implemented within 1 week (but can't make any promises).
The latest commit now supports the sscinames
output field. For makedb you have to use the --taxonnames
parameter to specify the path to the names.dmp
file from the NCBI taxdmp.zip
in addition to the --taxonmap
parameter.
Let me know if you need anything else.
Thanks for implementing this so quickly! Greatly appreciated.
Thank you for implementing this!
Hi @bbuchfink, Thank for implementing this feature. Do you plan for a new release soon ?
Yes, there should be a new release within 1-2 weeks.
The latest commit now supports the
sscinames
output field. For makedb you have to use the--taxonnames
parameter to specify the path to thenames.dmp
file from the NCBItaxdmp.zip
in addition to the--taxonmap
parameter.Let me know if you need anything else.
Hello, this thread helped me with a similar issue. Thank you!
I am having trouble using --taxonnames
parameter while making the database now and getting an error saying it is an Invalid option. --taxonmap
and --taxonnodes
is working fine.
What version of diamond are you using? Check with diamond --version
.
diamond --version
It is diamond version 0.9.14
You will need to use a later version then, the option was not supported back then.
Could I output with '-f 102' format, and with 'sscinames' as a column, instead of 'NCBI taxonomy ID'.
Could I output with '-f 102' format, and with 'sscinames' as a column, instead of 'NCBI taxonomy ID'.
No that is not supported at the moment.
I am also looking for a way to obtain the taxonomic name of the Diamond blast hits, is there really no way? Alternatively, did anyone write a script which can return the scientific name from a list of 'NCBI taxonomy ID'? I have thousands of sequences.... Would be greatful for help with this, I thought I fixed it by building the nr database with the --taxonmap and --taxonnodes but follow this discussion here it seems it has been a feature that worked, but is no longer.....?
I am also looking for a way to obtain the taxonomic name of the Diamond blast hits, is there really no way? Alternatively, did anyone write a script which can return the scientific name from a list of 'NCBI taxonomy ID'? I have thousands of sequences.... Would be greatful for help with this, I thought I fixed it by building the nr database with the --taxonmap and --taxonnodes but follow this discussion here it seems it has been a feature that worked, but is no longer.....?
Do you need this for the tabular output format (-f 6
) or the taxonomy output format?
Sorry for late reply, I was out of office for a few days. I tried now this setting:
-f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore staxids sscinames stitle --top 2
which gave me an output like this:
seq1 MBZ5604617.1 75.0 76 19 0 2 229 133 208 2.04e-29 120 2081523 2081523 MBZ5604617.1 insulinase family protein [Acidobacteriia bacterium]
I am currently only interested in the last information of taxon ID in the [], but much rather I would like to have the full taxonomic string from domain, phyla etc. etc.... Is that possible?
You need to build the database with the --taxonnames
option. Due to a bug in the current version, you should build it using v2.0.15. Then the output fields sscinames
, sskingdoms
(super kingdom), skingdoms
and sphylums
should be available. Other ranks are not supported, although it would be possible to add this in a future version.
I worked perfectly - thank you!
Hi can anyone get the full taxonomy list in their output?
I would like to get families and subfamilies if possible.
qseqid | sseqid | pident | length | mismatch | evalue | bitscore | staxids | sscinames | sskingdoms | skingdoms | sphylums | stitle |
---|---|---|---|---|---|---|---|---|---|---|---|---|
b10d4f95-363e-42b1-9b35-1eca05f80366 | YP_004894429.1 | 40.2 | 102 | 61 | 4.08E-17 | 78.6 | 1094892 | 1094892 | 0 | 0 | 0 | YP_004894429.1 clamp loader of DNA polymerase [Megavirus chiliensis] |
Here is my code:
diamond blastx -d /lustre/project/taw/kvigil/Reference/viralprotein.dmnd -q barcode04.fastq.gz -o ONR030223barcode04.tsv --ultra-sensitive -f 6 qseqid sseqid pident length mismatch evalue bitscore staxids sscinames sskingdoms skingdoms sphylums stitle
Do I need to add anything extra for long-read nanopore sequences when I execute Diamond?
@lucyintheskyzzz As I uderstand getting the lower taxonomic ranks (from phyla down to species) is not supported by Diamond (yet?). I would be really interrested in this too, I am currently trying ot use MEGAN (meganizer) to obtain this output, but it's not working for me and I have waited a few weeks now for online support for the issues I am having with that approach (Megan community)......
Hi can anyone get the full taxonomy list in their output?
I would like to get families and subfamilies if possible.
qseqid sseqid pident length mismatch evalue bitscore staxids sscinames sskingdoms skingdoms sphylums stitle b10d4f95-363e-42b1-9b35-1eca05f80366 YP_004894429.1 40.2 102 61 4.08E-17 78.6 1094892 1094892 0 0 0 YP_004894429.1 clamp loader of DNA polymerase [Megavirus chiliensis] Here is my code:
diamond blastx -d /lustre/project/taw/kvigil/Reference/viralprotein.dmnd -q barcode04.fastq.gz -o ONR030223barcode04.tsv --ultra-sensitive -f 6 qseqid sseqid pident length mismatch evalue bitscore staxids sscinames sskingdoms skingdoms sphylums stitle
Do I need to add anything extra for long-read nanopore sequences when I execute Diamond?
I'm not sure what you mean, is the problem that the sskingdoms
, skingdoms
and sphylums
fields are 0 or do you need other fields?
Hi @bbuchfink I am actually getting the kingdoms and phylums, but I would like to get class, family, subfamily, genus and species if possible?- The data with zeros are viruses that have not been classified yet. Usually for metagenomic sequencing publications with viruses, people use the viral family to create figures.
@Mathildebd yes I have asked on the megan community many times how to export a .tsv file similar to Diamond and I still can't figure it out. Everything exports as a .txt file in MEGAN and when I switch it to .csv all the data is still in one column on excel. The output from Diamond is what I have been looking for, I just need class, family , subfamily, genus and species if possible. This would be amazing!
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
length | mismatch | evalue | bitscore | staxids | sscinames | sskingdoms | skingdoms | sphylums | stitle -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 37 | 19 | 1.09E-05 | 45.8 | 2267653 | Erwinia phage Wellington | Viruses | Heunggongvirae | Uroviricota | YP_009806609.1 hypothetical protein HOT70_gp176 [Erwinia phage Wellington]
Hello,
I am trying to get Diamond output that includes taxonomic information (similar to sscinames and sskingdoms in blast) and have made a database which included the –taxonmap and –taxonnodes parameters. I am using the 102 taxonomic classification output, and this is giving me the query ID, the NCBI taxonomic ID and the evalue.
How do I then turn this into something that will tell me the species/family/kingdom of the match? I have files with thousands of sequences and it would be really handy if there was an output that could tell me the species/family/kingdom of each match, instead of just a number?
Apologies if this has been addressed elsewhere, I just wasn't sure of how to proceed.
Thank you for your help.