benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 141 forks source link

Increasing number of nonchimeric reads compared to the number of input reads #1947

Open Abdelrahim-maker opened 1 month ago

Abdelrahim-maker commented 1 month ago

Hi everyone

I was using dada2 in R on my arbuscular mycorrhizal fungi reads and I noticed that after truncating them, I got an increased number of nonchimeric reads compared to the input I had. for example my input for one of the samples was 666 reads and the nonchim is 47,000. another sample has an input of 8 and nonchim which is 49 reads. How is that possible?

Thanks

benjjneb commented 1 month ago

It's not possible. How are you arriving at this result?

Abdelrahim-maker commented 1 month ago

@benjjneb apologies for the late reply. That is how I was doing it Parameters used: filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=c(280,275), maxN=0, maxEE=c(2,2), trimLeft = 10, truncQ=2, rm.phix=FALSE, compress=FALSE, multithread=n_cores, verbose = TRUE) overall Median loss (difference between input and nonchimeric) = 15311, percentage= 46.99108 Overall minimum percentage loss = 32.5477 Overall maximum percentage loss = 100

benjjneb commented 1 month ago

Can you provide the output of read tracking through the workflow, as demonstrated in the read tracking section of the dada2 tutorial? https://benjjneb.github.io/dada2/tutorial.html#track-reads-through-the-pipeline

Abdelrahim-maker commented 1 month ago

16s_reads_summary (1) (1) (2).csv densoised_reads_amf (5) (2).csv

These are the output for AMF and 16S rRNA which I have noticed issues after using these parameters

benjjneb commented 1 month ago

Can you clarify what your issue is? I do not see the examples that you referenced in your first post, nor any sample where the number of nonchimeric reads is more than the input.

IK237 commented 2 weeks ago

Hello, I think I have a similar issue as above. I'm sequencing the ITS2 region for Fungal Reads and I've noticed a few samples where I have a low input of initial reads and a significantly higher number of reads after going through the other parts of the pipeline.

Here is the code I am running to get these results ITS Sequencing Code.txt

Here is a list of the reads tracked through the pipeline. ITS Read Track.csv

JKA010219ITS, JKA010293ITS, and JKA010580ITS are a few of my samples that do this.

I've looked but can't seem to find any other reports of this happening. I generally have kept the default pipeline for ITS reads in Dada2. My primers are ITS4 and fITS7. Thank you for your help!

Abdelrahim-maker commented 2 weeks ago

@IK237 I found this code to adjust my table https://github.com/benjjneb/dada2/issues/710 and it helped me. Now my reads are fine. Hopefully this would solve your issue too

IK237 commented 2 weeks ago

I ended up fixing this but going to an adjacent post on this issue found here https://github.com/benjjneb/dada2/issues/715 so thanks for that info Abdelrahim-maker.

I fixed this by doing

nrow(out) [1] 177 length(dadaFs) [1] 171 length(dadaRs) [1] 171 length(mergers) [1] 171 nrow(seqtab) [1] 171 nrow(seqtab.nochim) [1] 171

The numbers generated here should all be equal. nrow(out) was not for me. To fix this I ran this code to assign an "exists" variable.

exists <- file.exists(filtFs) table(exists) exists FALSE TRUE 6 171

The comma in out = out[exists,] below denotes that dada2 should look at the rows and is necessary as far as I can tell.

out = out[exists,]

running this again shows that all sample numbers are equal and should be good to run the cbind command.

nrow(out) [1] 171 length(dadaFs) [1] 171 length(dadaRs) [1] 171 length(mergers) [1] 171 nrow(seqtab) [1] 171 nrow(seqtab.nochim) [1] 171