Retrieval more taxonomics IDs than the one present in the "prot.accession2taxid.FULL"

Hello,

I built Diamond database using; diamond makedb --in nr.gz \ –db nr_diamond –taxonmap prot.accession2taxid.FULL –taxonnodes nodes.dmp –taxonnames names.dmp –threads 72

and run diamond blastp to get a tabular file with subject sequence id and matching taxonomic IDs. When I inspected some of the results, even there is only one matching taxonomic ID for a protein (for ex, tax ID for 'WP_119979703.1' is '2292949') in "prot.accession2taxid.FULL" and on NCBI website, I got more than one entries for some ("29523" and "2292949" for 'WP_119979703.1').

When I try to use MEGAN, the LCA algorithm may cause to retrieve root for most of such entries, and loosing the taxon resolution. I cannot perform manual search in"prot.accession2taxid.FULL", because it will take ages. Can you help me to understand the issue?

Best regards.

bbuchfink / diamond

Retrieval more taxonomics IDs than the one present in the "prot.accession2taxid.FULL" #599