memory leak during learnErrors

Update

It occurs during multithreading..

R version 4.3.0 Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045) dada2 version: 1.28.0 RcppParrallel version: 5.1.7

Test data of 52 x 300bp PE miseq samples with a total of 1.95 gb as fastq.gz 11098880 total bases in 48256 reads from 1 sample used for learning the error rates seed set before each iteration

learnErrors( inputfiles, nbases = 1e7, nreads = NULL, errorEstimationFunction = loessErrfun, multithread = x, randomize = FALSE, MAX_CONSIST = 10, OMEGA_C = 0, qualityType = "Auto", verbose = 1)

Cores / memory leaked (mb) / time (s) FALSE / 4 / 324 2 / 42 / 374 4 / 102 / 401 8 / 139 / 233 TRUE(12) / 398 / 243

This problem increases substantially when inferring sequences as it is more memory intensive:

dada(inputfiles, err = errors, multithread = x verbose = 1)

Cores / memory leaked (mb) / time (s) TRUE(12) / 3167 / 1480

When the non-paged pool (NPP) gets to a certain level (in my case ~4.5 gb on a 16 gb RAM laptop), R isnt crashing, but it is taking a lot longer to process each sample. Re-running the above command increased the memory allocated to the NPP at the same rate as the first time running the command (~2 mb/s, until it reached 4.5 gb where it slowed to a relative trickle (~400 kb/s), and per sample processing went from ~30 s to ~180 s, and seems to be increasing (running it currently). As the infer sequences for the forward reads took 3.1 gb, that means a restart of an R session to get the reverse sequences done without processing slowing significantly.

Apologies if this has now turned into a rehash of other windows/Rcpp multithreading issues..

Cheers, Ad

benjjneb / dada2

memory leak during learnErrors #1778