Closed ankit4035 closed 1 month ago
I can only offer vague feedback as I haven't tried to use DADA2 on this type of data. My largest concern is with this feature of your data:
During library prep, primers are partially digested, therefore for the same target there could be multiple sequences with multiple lengths of primers sequenced and some having no primers as well.
DADA2 is sensitive to differences in the start positions of the reads, and seems likely to generate multiple (perhaps many) ASVs corresponding to the same allele in the situation you've described. This also has the effect of reducing sensitivity, since one AMR allele that is read 20 times, but with 20 different starting positions, is likely to be missed altogether. There probably is a way to mostly solve this, but I would consider it a major concern, and woudl make me strongly consider an alternative bioinfo approach that maps reads to the known suite of AMR genes first in order to separate them, and then proceeds from that.
In principle, collapseNoMismatch
will help with that issue, but doesn't solve the loss of sensitivity and is both slow and probably not perfect.
Chimera removal. I don't know if this will work well as well.
The above starting-point variation issues could also affect chimera identification, similarly to how unremoved primers cause way too many reads to be removed as chimeras in standard 1-target amplicon sequencing data.
Thanks alot for your comments and suggestions. I will try to explore the possibilities and see how it goes.
Hi Ben,
I am big fan of DADA2 for analysing 16S data. I recently worked with AmpliSeq data generated on Illumina with paired-end. In nutshell, multiple targets (AMR genes) (>800) were amplified and sequenced. I thought it interesting to analyse that data with DADA2 to find various sequence variants for each target. But there were various problems with AmpliSeq data to be directly run on DADA2 pipeline.
I worked out a pipeline for analysing such data starting with paired fastq files. The data was QC filtered and Illumina adapter trimmed (cutadapt).
tryRC = true
. This is to merge all the variants which are same but in reverse orientation. Since same table is merged with itself. All counts will be doubled, which can be rectified downstreM.collapseNoMismatch
. This is to collapse some variants which originate from same target but with different length and subset of each other. This is just a preventive measure; I don't know if this going to be useful at all.Can you comment on the suitability of the pipeline flow.
Thank you