benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
462 stars 142 forks source link

Removing primers with trimLeft #1807

Open nuorenarra opened 1 year ago

nuorenarra commented 1 year ago

Dear Benjamin, apologies if this is not a real "issue", but I am curious to know why dada2 tutorials (such as 1.2 tutorial but i think others as well) explicitly state that they asume you are working with sequences where primers have been removed in advance? I thought this is something one can do with the trimLeft argument upon filtering and trimming. I have had a practise of using cutadapt to assess what is a good length to remove, and then using trimLeft within dada2 to remove these from R1 and R2 reads (the removed length is very similar for these in my case). Are there any problems with this approach I am failing to see? thanks for your perspective!

benjjneb commented 1 year ago

Are there any problems with this approach I am failing to see? thanks for your perspective!

If: Primers are a constant length and at the start of the forward/reverse reads (AND) reads are short enough or are truncated to be short enough to not read into the other primer (THEN) the trimLeft approach works great.

That is the case in a majority of amplicon data, but there are important exceptions, like heterogeneity spacer designs, or highly variable-length amplicons like ITS.

We could perhaps update our language, as the trimLeft approach is recommended when it is appropriate. But alternative solutions are needed in those exceptions.

nuorenarra commented 1 year ago

Thanks a lot, this is very helpful.

I guess many people work in parallel with both 16S and ITS data, so might be insightful to them to mention somewhere in the tutorial that one must ensure their amplicons and primers are of standard length before using the trimleft for primer removal, and this is not the case for ITS for example.