DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
184 stars 44 forks source link

undef phylum #93

Closed XClaws closed 4 years ago

XClaws commented 4 years ago

Dear Team, I am trying to use the blobtools for my assembly content check. I have run the blastn and most of my contigs has a hit with taxid and I am sure they are in right order for down-stream analysis of blobtool. However, I found that all the contigs in the blobDB.table.txt file have undef in the pythlum column. Why did this happen? Could you please help me to fix it? Thank you!

DRL commented 4 years ago

From Laetsch and Blaxter, 2017

[..] three non-canonical taxonomic annotations are possible: ‘no-hit’, the suffix ‘-undef’ and ‘unresolved’. Sequences not assigned to any taxonomic group, or not passing the --min_score threshold, are labelled ‘no-hit’. If a NCBI TaxID has no explicit parent at a taxonomic rank, the suffix ‘-undef’ is appended to the next upper taxonomic rank for which one does exist. In cases where the score difference between the best and second-best hits is smaller than --min_diff, sequences are labelled as ‘unresolved’.

Seems like your organism has no phylum in the NCBI taxonomy database.

cheers,

dom

XClaws commented 4 years ago

Dear Dom,

I think you are right. Although my organism is in the nodeDB.txt, it seems it has no taxid in nt database.. Thank you!

DRL commented 4 years ago

Hi XClaws,

No worries. It is not as uncommon as one might think. You can check the taxonomy lineage for your organism on https://www.ncbi.nlm.nih.gov/taxonomy ...

And then just generate plots/tables using the taxonomic ranks that makes sense for your organism(s)...

cheers,

dom