benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
470 stars 142 forks source link

Dada2 "learnErrors" error #336

Closed drish91 closed 7 years ago

drish91 commented 7 years ago

The filtering and dereplication worked out perfectly, but now I'm having trouble training the parametric error model. I'm following the big data tutorial but end up getting this error:

set.seed(100) errF <- learnErrors(filtFs, nread=1e5, multithread=FALSE) Initializing error rates to maximum possible estimate. Error in dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques), : std::bad_alloc

I also tried changing the multithreading parameter to TRUE, but that just killed the job.

Also tried this out but it resulted in the same error:

dadaFs.lrn <- dada(derepFs, err=NULL, selfConsist = TRUE, multithread=FALSE) Initializing error rates to maximum possible estimate. Killed

I checked the script, and I'm curious about something. Why does the script look for "derep[[i]]" instead of derepF[[i]]?

names(derepFs[[1]]) [1] "uniques" "quals" "map" names(derepRs[[1]]) [1] "uniques" "quals" "map"

I'm running a qlogin session, and I have 36 samples in total.

Thanks, Drishti

benjjneb commented 7 years ago

So... this seems to be the same error as before in #333

Furthermore, I believe that this error is being caused by the Rcpp package and not dada2 (directly) -- w/in dada2 C code we use malloc to allocate memory, and catch errors ourself. These std::bad_alloc errors probably indicate issues arising in the Rcpp code.

Unfortunately, at this point the best next step would be to try (1) reinstalling Rcpp and RcppParallel, and then reinstalling the dada2 R package, and if that doesn't work (2) reinstalling R and then (1).

Is this possible in your computing environment? Also, what are the versions of R.version, packageVersion("Rcpp") and packageVersion("RcppParallel") in the environment where you are seeing this error?

drish91 commented 7 years ago

Hi Ben!

They're all the most recent versions:

packageVersion("Rcpp") [1] ‘0.12.12’

packageVersion("RcppParallel") [1] ‘4.3.20’

packageVersion("dada2") [1] ‘1.5.2’

packageVersion("ShortRead") [1] ‘1.34.1’ R.version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 4.1 year 2017 month 06 day 30 svn rev 72865 language R version.string R version 3.4.1 (2017-06-30) nickname Single Candle

I recently installed the newest R (and the packages that go with it) so I'm a little hesitant to install everything again. So I'm presuming you couldn't reproduce the error on your end?

drish91 commented 7 years ago

Additionally, I'm not sure why it would throw an error in dada_uniques() though? Is it because it's calling derep instead of derepFs or derepRs, since there's no object called "derep" in my data!

Error in dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques)

benjjneb commented 7 years ago

Additionally, I'm not sure why it would throw an error in dada_uniques() though? Is it because it's calling derep instead of derepFs or derepRs, since there's no object called "derep" in my data!

The derep variable name is the name used inside the dada function, where the dada_uniques call is made.

benjjneb commented 7 years ago

I don't see any obvious issues in the package versions you are using.

Could you try to install the current release version of the dada2 package from Bioconductor, and see if that resolves things? E.g.

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("dada2")

That should get you packageVersion("dada2") of 1.4.0

drish91 commented 7 years ago

packageVersion("dada2") [1] ‘1.5.2’

I'll try downgrading to 1.4.0, and let you know if that fixes anything. Any particular reason for that?

drish91 commented 7 years ago

That works!!

Any particular reason why the earlier version worked out?

benjjneb commented 7 years ago

Great! Probably that would solve the earlier problem with the filtering as well.

I don't completely understand what happened, but it likely has something to do with the compilation of the C code. The functions in dada2 that are written in C (for performance reasons) use Rcpp as the "glue" between C and R. If the Rcpp package and the dada2 package were compiled differently, then problems could arise, but to be honest those details get beyond my expertise.

mkierczak commented 5 years ago

Hi,

same is happening to me. Has never been an issue before with previous versions of dada2. Now: 3690556800 total bases in 24603712 reads from 1 samples will be used for learning the error rates. Initializing error rates to maximum possible estimate. Error in dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques), : Memory allocation failed. Calls: learnErrors -> dada -> dada_uniques Execution halted Warning message: system call failed: Cannot allocate memory

Package versions: `> packageVersion('Rcpp') [1] '1.0.0'

packageVersion('RcppParallel') [1] '4.4.2' packageVersion('ShortRead') [1] '1.40.0' packageVersion('dada2') [1] '1.8.0' R.version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 3 minor 5.2 year 2018 month 12 day 20 svn rev 75870 language R version.string R version 3.5.2 (2018-12-20) nickname Eggshell Igloo`

I am running Ubuntu Xenial, 16GB RAM.

Tried downgrading to previous R and dada2 -- no effect. Everything installed from scratch via Bioconductor. Happens on different datasets, not just one. Any clues?

Strangely enough, on my OsX laptop (also 16GB RAM) I managed to run the same code on the same datasets successfully but had to stop all possible applications and go offline. Otherwise memory usage is too high and everything crashes. A memory leak introduced with some Rcpp version?