genomicsITER / NanoCLUST

NanoCLUST is an analysis pipeline for UMAP-based classification of amplicon-based full-length 16S rRNA nanopore reads
MIT License
106 stars 49 forks source link

Global OTU table #15

Open erifa1 opened 4 years ago

erifa1 commented 4 years ago

Dear all,

thanks for providing this workflow. I'm wondering if the pipeline does global clustering on overall sequences when using wildcards in --reads option (*.fastq) ? I have tested and i'm getting independent results folders for each fastq files. Is there option to get unified results as a classic OTU table that allow to compare samples.

Thanks for your help Etienne

genomicsITER commented 4 years ago

Hi Etienne,

Thank you for opening this issue and the suggestions. Input specification is something we would like to change in the near future and make it easier to the users when running the pipeline right after basecalling. At the moment, the --reads option only accepts bulk files containing all reads in the sample/barcode, both for --demultiplex and normal modes. You could generate this bulk file with "cat my_sample/*.fastq > mysample_bulk_file.fastq". At this time we don not know which approach to input specification is more comprehensive, but we will discuss and update this issue when the new input specification is ready

We have updated the pipeline adding abundance tables for all taxonomic levels but at this time they only cover one sample per table. We will work on providing something like an OTU table output file including all samples covered in the pipeline execution. We will update the issue when it is ready.

Regards,

Héctor

quetjaune commented 1 year ago

Thanks for this useful workflow. Any update on the OTU table for all samples?
Regards, Marcos