benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
471 stars 143 forks source link

BigDataPaired - Error in loop sequence #1281

Open clairewoodall opened 3 years ago

clairewoodall commented 3 years ago

Hello Ben, I'm very new to R and I'm sure this is very basic, but I hope you can help. I've checked StackOverflow and can't seem to find an answer to my issue.

I'm running 208 paired-end sequence files using your excellent tutorial here, https://benjjneb.github.io/dada2/bigdata_paired.html

But I'm stuck on the following section, Sample inference and merger of paired-end reads:

mergers <- vector("list", length(sample.names)) names(mergers) <- sample.names for(sam in sample.names) { cat("Processing:", sam, "\n") derepF <- derepFastq(filtFs[[sam]]) ddF <- dada(derepF, err=errF, multithread=TRUE) derepR <- derepFastq(filtRs[[sam]]) ddR <- dada(derepR, err=errR, multithread=TRUE) merger <- mergePairs(ddF, derepF, ddR, derepR) mergers[[sam]] <- merger } rm(derepF); rm(derepR)

Should my script be written like this, Part A:

mergersF <- vector("list", length(sample.namesF)) mergersR <- vector("list", length(sample.namesR)) names(mergers) <- sample.namesF names(mergers) <- sample.namesR for(sam in sample.names) {cat("Processing:", sam, "\n")

Or, like this, Part B:

mergers <- vector("list", length(sample.names)) names(mergers) <- sample.names for(sam in sample.names) {cat("Processing:", sam, "\n")

Part A seems like the best thing to do to keep the forward and reverse reads apart before they are merged.

However, for the looped section I get an error message which I can't fix, see below:

Error in for (sam in sample.names) { : invalid for() loop sequence

Is this something to do with the actual name of my sequence files?

I've replaced the 'sam' with a 'D' (which is common to all the names of the files) and I still get this error message:

Error in for (D in sample.names) { : invalid for() loop sequence

I'm not sure what to do next. I would appreciate a bit of your time to fix this so I can move onto the next stage.

Thank you C

benjjneb commented 3 years ago

First, I suggest you try using the standard tutorial workflow and see if that solves your issues. There have been significant upgrades to the package since the Big Data workflow was published, and the tutorial workflow now controls memory (nearly) as well as the Big Data workflow does, and is a bit easier and more robust code: https://benjjneb.github.io/dada2/tutorial.html

Second, right now your code is hard to read because of the mis-formatting. What you want to do is to put code in these comments in between "fences" of three backticks. See a guide on markdown here: https://guides.github.com/pdfs/markdown-cheatsheet-online.pdf

clairewoodall commented 3 years ago

Hi Ben, Thank you for your swift response. I tried the Bioconductor standard tutorial and found that RStudio didn't have enough memory for the dereplication stage. This is why I switched to the BigData workflow. But now I can see that the new robust standard tutorial doesn't have the dereplication section. So I'll give it a go first thing tomorrow! Thanks so much and have a good evening. C

benjjneb commented 3 years ago

Should update the Bioconductor vignette to use the new Tutorial workflow.