Closed larajan67 closed 4 months ago
Do these different sequencing runs use the same library preparation approach?
Some amplicon library preps will include the primers on the sequenced part of the reads, while others will not. Even if the primers used are the same, these different library prep approaches will lead to difference in the sequenced amplicon lengths of the total length of the two primers, which is usually ~30 nts.
That's my guess for what is going on here. It's fixable, but first it would be good to confirm or disprove that possibility. One can do this by simply inspecting the raw sequences from each run... is the primers sequence present at the start of the reads or not?
I have confirmed with my collaborators, who led the sequencing, that same library preparation approach was used.
I did find that the specific cutadapt code did not remove the primers in this 3rd run as it had for the first 2 runs. I slightly edited the cutadapt code and the primers were mostly removed (less than 1% remaining).
However, there remains still a difference in sequence length between the 3 runs: 1 & 2-214; 3-231 (see graph). Do you have any other ideas? Thank you for your help.
I'd recommend removing the cutadapt step that seems to be introducing this spurious length variation. Just use trimLeft
in filterAndTrim
to remove the primers instead.
I used the trimLeft in filterAndTrim instead of cutadapt. However, the sequence length difference remains with a difference of 14 between runs 1/2 and run 3. Any other ideas? Thank you, Lara
I would try aligning the most abundant ASVs in run 1 vs. run 3, and trying to identify where the additional sequence length observed in run 3 might be. I'm still betting it is some technical issue, but maybe not?
I apologize for the delayed response. I had another project with higher priority.
As I am relatively new to dada2, I would like to double check my approach for alignment of specific sequences. Would this be the best approach after filtering for the most abundant ASVs? (see below) sequences<-getSequences(seqtab.nochim_most) alignment <- AlignSeqs(DNAStringSet(sequences), anchor=NA) phang.align <- phyDat(as(alignment, "matrix"), type="DNA")
Then I would examine the phang.align output?
That works. The signal we are looking for here is pretty crude (where are the extra bases in run 3 vs run 1?), so any approach that lets you view the alignemnt of a few top ASVs from each run should do the trick.
Here are snapshots from the output of 'BrowseSeqs' for Run 1 (top) and Run 3 (bottom) for the most abundant ASVs. What do you recommend?
Hello, I am relatively new to DNA sequencing analyses so I appreciate the detailed tutorials and feedback on previous issues.
I have 16s rRNA data from 3 different sequence runs using the same primers. I have run all runs through the dada2 workflow using the same parameters for the 'FilterandTrim' step. 2 of the runs have reads mostly with a sequence length of ~214, while the 3rd run has reads mostly with a sequence length of ~239. I want to merge these runs together into a single dataset for downstream analysis, yet this difference of sequence lengths is an issue (see graph).
What can I do to resolve this?