asrivathsan / ONTbarcoder

27 stars 3 forks source link

16S rRNA microbiome (variable length) demultiplexing and primer trimming #10

Open iivanov78 opened 4 months ago

iivanov78 commented 4 months ago

Hi there,

I have two questions:

  1. You have mentioned that "ONTbarcoder has been optimized for protein coding gene like COI. While there are ways to obtain consensus for length variable non coding genes, this has not been extensively tested." What would be those ways? Basically, I have ONT fastq reads from 16S rRNA microbiome amplicons (1200bp to 2000bp) with custom barcodes. Is there a way to process (demultiplex and trim) them with ONTbarcodder?

  2. Is Dorado basecaller need to be installed (Windows) for demultiplexing?

Best, Ivan

asrivathsan commented 4 months ago

Hi Ivan

  1. ONTbarcoder is mostly designed to get a dominant consensus sequence when experiments are done using tagged-amplicon appoach. It is not really optmized for microbiome studies. You could use it for demultiplexing initially and then process the demultiplexed data via other pipelines. In that case, I would suggest setting the parameter for minimum length to 1200. It would be good to edit the window to define product size to be wide along with an expected length at about 1600 to account the length variability for this.

If you want consensus sequence (this is the part that is usually not relevant for microbiome studies): for length variable genes, I would limit the analysis to only "consensus by length" step. I would also not subet the sequences by length, by setting the coverage at this step to 0.

  1. If you are using this for "conventional" mode, the data has to be already basecalled. Only if real time barcoding is being conducted, then we have integrated dorado for baseccalling to allow for different types of set ups. hope this helps! Amrita