benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
461 stars 142 forks source link

filterAndTrim #2007

Open AliciaBalbin opened 2 weeks ago

AliciaBalbin commented 2 weeks ago

Hi, I would like to ask how "filterAndTrim" works. Since I run it with my forward and reward reads but I think it is just doing it by the forward reads. At least in the output I just only find my forward reads. However i am not bekomming any error. And I guess the function is doing this filtering and afterwards filtFs and filtRs should be filtered direct in the folder? - how to check this? I use the following comands: cpu = 40 out <- filterAndTrim(cutFs, filtFs, cutRs, filtRs, maxN = 0, maxEE = c(1,1), truncQ = 2, rm.phix=TRUE, minLen = 100, compress = TRUE, multithread = cpu, verbose = TRUE) out

Besides I also get a script from a friend where they are doing a dereplication before denoinsing. And indeed in your pipeline says "At this step, the core sample inference algorithm is applied to the dereplicated data." . Is then this step missing?

dereplication

derepFs <- derepFastq(filtFs, verbose = TRUE)

denoise reads

dadaFs <- dada(derepFs, err = errF, multithread = cpu, pool = FALSE)

Thank you very much :)

benjjneb commented 1 week ago

Hi, I would like to ask how "filterAndTrim" works. Since I run it with my forward and reward reads but I think it is just doing it by the forward reads. At least in the output I just only find my forward reads. However i am not bekomming any error. And I guess the function is doing this filtering and afterwards filtFs and filtRs should be filtered direct in the folder? - how to check this?

filterAndTrim looks at each forward-reverse read pair and makes filtering decisions jointly. That is, it keeps or throws away the whole pair, and never e.g. keeps the forward read from the pair while throwing away the reverse read. This keeps the filtered output in matched order.

I'm not sure why you are finding only your forward reads, but I would start by inspecting the filepaths you are providing filterAndTrim and making sure they look appropriate. For example, what do head(cutFs), head(filtFs),head(cutRs),head(filtRs)look like. Is all as expected? You can also use thefile.exists(filtRs)function to check if the files exist after runningfilterAndTrim`.

Besides I also get a script from a friend where they are doing a dereplication before denoinsing. And indeed in your pipeline says "At this step, the core sample inference algorithm is applied to the dereplicated data." . Is then this step missing?

dereplication

No it isn't missing. A few years ago we implemented dereplication "on the fly" in learnErrors and dada. This is preferred because it can dramatically reduce memory usage, since only one sample is loaded into memory at a time whereas the previous method of running derepFastq explicitly loaded all the samples into memory at once.