Open eray-sahin opened 2 years ago
If you look up this protein with NCBI, you can see under identical proteins that there's an entry (MBO4974725.1
) with taxon id 29523. These entries are merged if you use the NR database. I don't have a good solution for this at the moment. Would an option to ignore all taxids above species rank help you?
Hello,
I built Diamond database using;
diamond makedb --in nr.gz \ –db nr_diamond –taxonmap prot.accession2taxid.FULL –taxonnodes nodes.dmp –taxonnames names.dmp –threads 72
and run diamond blastp to get a tabular file with subject sequence id and matching taxonomic IDs. When I inspected some of the results, even there is only one matching taxonomic ID for a protein (for ex, tax ID for 'WP_119979703.1' is '2292949') in "prot.accession2taxid.FULL" and on NCBI website, I got more than one entries for some ("29523" and "2292949" for 'WP_119979703.1').
When I try to use MEGAN, the LCA algorithm may cause to retrieve root for most of such entries, and loosing the taxon resolution. I cannot perform manual search in"prot.accession2taxid.FULL", because it will take ages. Can you help me to understand the issue?
Best regards.