genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

Taxid output "Nan" #22

Closed Qi-Maria closed 3 years ago

Qi-Maria commented 3 years ago

Hi, When I run NanoCLUST, I got a taxid "nan" and the whole program crushed. I added few lines code in get_abundance.py by assigning taxids to the root. Is that correct way to solve the problem? or Is there any other way you suggest?
Like this: def get_taxname(tax_id,tax_level): tags = {"S": "species_name","G": "genus_name","F": "familyname","O":'order name', "C": "class_name", "P": "phylum_name"} tax_level_tag = tags[tax_level] if str(tax_id) == "nan": tax_id = 1

genomicsITER commented 3 years ago

Hi,

Thank you for opening the issue and sorry for the late response. We have recently seen some users with different issues in that step. Your suggestion is pretty ok for those cases with "nan" tax_ids. We don't have any data available to test the pipeline with that condition but we believe that assigning 'root' node to 'nan' tax_ids could be the right choice. Anyway, we always reccommend to check the .nanoclust.out file with the original top BLAST assignments for each cluster instead of the .csvs and plots generated later by the python scripts to better inspect pipeline results.

We have added your get_abundance.py edit to the main branch along with some other changes in that file to avoid tax_ID-to-name issues. Thank you for your contribution and feel free to open an issue again if something is not working!