benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
471 stars 142 forks source link

Use of priors with ITS data #1599

Open ramiroricardo opened 2 years ago

ramiroricardo commented 2 years ago

Dear all,

we have an ITS dataset to analyse and there is a species of interest that we would like to be more sensitive in detecting. We can go into pooling, but we were also wondering about the use of priors and if it makes sense in ITS data.

In our case, we are analyzing only forward reads and using truncQ and maxEE, but not truncLen. This will lead to reads of variable length, in principle even for the exact same taxa. In #496, it is indicated that priors have to have the exact same length as the processed reads, which is fine for 16S data, but I am guessing won't work for ITS data. This would mean, we should not (yet) use priors with ITS data.

Am I missing something / is there a way in which priors can still be included when analyzing ITS data?

Thanks

benjjneb commented 2 years ago

My first suggestions is to stop using truncQ to get rid of the artificial length variation that filtering technique introduces. It is almost never the right choice when analyzing data with DADA2.

If that is eliminated, then it becomes possible to use priors with ITS data, although one does have to be precise about setting the prior sequences to be of the same length as they will be in the data. However, if the target species ITS sequence is known, that should be achievable. Note that if working with paired-end data, you'll need to create priors separately for the forward and reverse reads.

ramiroricardo commented 2 years ago

Thanks for your quick reply. Just a further question: we were using truncQ as it is recommended in the DADA2 ITS workflow. Thus why would it not be recommended? because we are using forward reads only?

benjjneb commented 2 years ago

Hm, I see that there now. By default truncQ=2 is enforced anyway by filterAndTrim, as several variations of Cassava/Illumina would start assigning Q=2 when it no longer had any idea what the bases were. If you have just been using truncQ=2, I'm not too concerned actually as that should introduce relatively little length variation. But the way the tutorial is written is looks like truncQ=XX is a parameter we recommended potentially tuning, and that's not the case even for ITS data.

ramiroricardo commented 2 years ago

Thanks for the reply. Indeed we started wondering about changing truncQ after seeing this paper, where several values are tested: https://insight.jci.org/articles/view/151663