DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

Missing node in a taxonomy - undef #56

Closed alducluzeau closed 7 years ago

alducluzeau commented 7 years ago

Hi,

Thank you for creating Blobtools. It's a very promissing program! I installed it in order to help with the decontamination of a genome assembly. I just got my first job completed and I discover the "undef" suffix in the table created by "view". Consequently, I've got no plots. After going through Blobtools documentation, I understood the reason: I'm assembling the genome of a Sphaeroforma that misses taxonomic nodes between the kingdom and the class. However, my bug has a class/order/genus/species assigned. I launched a second job and added the -r flag with "genus" to see if I could bypass the problem but it seems not. I find this behaviour odd and pretty unfortunate: some taxonomic informations are missing and we can't do anything about that but, not being able to use the taxonomic infos downstream the missing node is quite a loss. I see this as a serious limitation of the program. Thus, I wonder if I missed a trick that could help or maybe if the "undef" suffix/missing node could be mentioned on the main page in order to warn users in a more obvious way.

Thank you,

Anne-Lise

DRL commented 7 years ago

Hi Anne-Lise,

I installed it in order to help with the decontamination of a genome assembly. I just got my first job completed and I discover the "undef" suffix in the table created by "view". Consequently, I've got no plots.

"Undefs" should not prevent you from generating plots, ... I assume the plot was uninformative.

After going through Blobtools documentation, I understood the reason: I'm assembling the genome of a Sphaeroforma that misses taxonomic nodes between the kingdom and the class. However, my bug has a class/order/genus/species assigned. I launched a second job and added the -r flag with "genus" to see if I could bypass the problem but it seems not.

Have you tried -r class or -r order for view/plot ?

I find this behaviour odd and pretty unfortunate: some taxonomic informations are missing and we can't do anything about that but, not being able to use the taxonomic infos downstream the missing node is quite a loss. I see this as a serious limitation of the program.

I'd say this is a limitation of the current state of the NCBI taxonomy, rather than of BlobTools ... BlobTools uses the NCBI taxonomy to link sequence similarity search results (via TaxIDs) to the NCBI taxonomy. The fact that some organisms have no assigned values as certain taxonomic ranks is beyond my responsibilities, I think ...

Thus, I wonder if I missed a trick that could help or maybe if the "undef" suffix/missing node could be mentioned on the main page in order to warn users in a more obvious way.

I would check whether the results of -r order and/or -r class bring you any happiness.

There are other solutions as well, but let me know whether this works before I suggest more complex solutions ... you might also want to check out this thread which deals with custom taxonomic annotation sequences outside of NCBI Taxonomy.

Let me know whether this helps.

cheers,

dom