benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
463 stars 142 forks source link

Problem with the folder filtN #1388

Closed Deborahgn closed 3 months ago

Deborahgn commented 3 years ago

Hello, we are facing an issue and we are running out of option to fix it.

In the Pre filtration part in the filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE) function, our samples are supposed to be put in the filtN folder. The problem is : the files actually appear in pair in the folder while the function is running and disappear a few seconds latter leaving the folder empty... Yet fnFs and fnRs exist with the good number of samples (210 files in each).

We tried to add the matchIDs =TRUE argument to the function but with no success... We checked the format of our samples and their names, we did not see any problem We also checked our script with other samples and it worked...

By any chance, would you have any idea on how to fix this problem ?

Here is the piece of code that we are using :

fnFs <- sort(list.files(fastq, pattern = "_R1_001.fastq.gz", full.names = TRUE)) fnRs <- sort(list.files(fastq, pattern = "_R2_001.fastq.gz", full.names = TRUE))

Pre filtration

fnFs.filtN <- file.path(fastq, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory fnRs.filtN <- file.path(fastq, "filtN", basename(fnRs)) filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE, matchIDs = TRUE)

benjjneb commented 3 years ago

My first guess is that this is an issue with the assigned file paths. Can you show the output of:

fnFs[1:2]
fnRs[1:2]
fnFs.filtN[1:2]
fnRs.filtN[1:2]
Deborahgn commented 3 years ago

Thank you very much for you fast answer, here are the outputs :

fnFs[1:2] [1] "./Fastq/10A_S56_L001_R1_001.fastq.gz" [2] "./Fastq/10B_S68_L001_R1_001.fastq.gz"

fnRs[1:2] [1] "./Fastq/10A_S56_L001_R2_001.fastq.gz" [2] "./Fastq/10B_S68_L001_R2_001.fastq.gz"

fnFs.filtN[1:2] [1] "./Fastq/filtN/10A_S56_L001_R1_001.fastq.gz" [2] "./Fastq/filtN/10B_S68_L001_R1_001.fastq.gz"

fnRs.filtN[1:2] [1] "./Fastq/filtN/10A_S56_L001_R2_001.fastq.gz" [2] "./Fastq/filtN/10B_S68_L001_R2_001.fastq.gz"

What seems to be weird is that when we perform the DADA2 analysis with other files, the problem doesn't exist. The path on the ouputs seems to be correct so we are confused on how we should proceed.

Thanks again for your time, we appreciate it !

benjjneb commented 3 years ago

What is the output of filterAndTrim(..., verbose=TRUE) on these files that are appearing and then disappearing? It may be that no reads are passing the filters.

Another thing to look at would be table(file.exists(fnFs)) and table(file.exists(fnFs.filtN)).

Deborahgn commented 3 years ago

When adding the verbose parameter, there is no specific output. In the console it's as if everything worked. There is no error message :

Pre filtration

fnFs.filtN <- file.path(fastq, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory fnRs.filtN <- file.path(fastq, "filtN", basename(fnRs)) filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = TRUE, verbose = TRUE)

primerHits <- function(primer, fn) {

  • Counts number of reads in which the primer is found

  • nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  • return(sum(nhits > 0))
  • }

And those are the outputs of what you asked :

table(file.exists(fnFs))

TRUE 208

table(file.exists(fnFs.filtN))

FALSE 208

We actually have one question because like you, we thought that no reads were passing the filters, but we were wondering why because the filters we apply aren't very stringent ?

And what's even more confusing to us is that we tried to bypass this first step by putting our files manually inside the filtN directory and at the end, we did have some reads that did go through the rest of the process.

Once again, thank you so much for your time, we really do appreciate it.

benjjneb commented 3 years ago

When adding the verbose parameter, there is no specific output.

Can you reproduce the output here? Perhaps from just filtering the first 2 files to be concise, and set multithread=FALSE as well. I'm still not sure from what you have posted so far whether or not all reads are being lost int he filter.

Deborahgn commented 3 years ago

I set multithread = FALSE and only filtered the first two files and as before the output in the console was :

`# Pre filtration

fnFs.filtN <- file.path(fastq, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory fnRs.filtN <- file.path(fastq, "filtN", basename(fnRs)) filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)

primerHits <- function(primer, fn) {

  • Counts number of reads in which the primer is found

  • nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  • return(sum(nhits > 0))
  • }`

There was no specific output and the files are still not appearing

cresil commented 3 years ago

Check your quality profile. It could be that one sequencing cycle failed and all sequences have N calls.

I've seen ITS runs where the 3rd cycle failed and all reads contained a N on this position.

On Fri, Aug 13, 2021, 19:45 Deborahgn @.***> wrote:

I set multithread = FALSE and only filtered the first two files and as before the output in the console was :

`# Pre filtration

fnFs.filtN <- file.path(fastq, "filtN", basename(fnFs)) # Put N-filterd files in filtN/ subdirectory fnRs.filtN <- file.path(fastq, "filtN", basename(fnRs)) filterAndTrim(fnFs, fnFs.filtN, fnRs, fnRs.filtN, maxN = 0, multithread = FALSE)

primerHits <- function(primer, fn) {

  • Counts number of reads in which the primer is found
  • nhits <- vcountPattern(primer, sread(readFastq(fn)), fixed = FALSE)
  • return(sum(nhits > 0))
  • }`

There was no specific output and the files are still not appearing

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/benjjneb/dada2/issues/1388#issuecomment-898621595, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE4JZDRQCOFDKAVR4HGSYC3T4VK37ANCNFSM5BOK4UCA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .