benjjneb / dada2

Accurate sample inference from amplicon data with single nucleotide resolution
http://benjjneb.github.io/dada2/
GNU Lesser General Public License v3.0
460 stars 141 forks source link

Segmentation fault in learnErrors using bioconda dada2 1.10 #684

Closed apcamargo closed 5 years ago

apcamargo commented 5 years ago

I'm bumping into segmentation faults when trying to use learnErrors. This happened with version 1.8.0 and 1.10.0. I don't have this problem in version 1.6.0.

Here's the traceback when I was using version 1.10.0.

116200805 total bases in 474289 reads from 3 samples will be used for learning the error rates.
111156660 total bases in 617537 reads from 4 samples will be used for learning the error rates.

 *** caught segfault ***
address 0xfffffffffffffff7, cause 'memory not mapped'

 *** caught segfault ***
address 0xfffffffffffffff7, cause 'memory not mapped'

 *** caught segfault ***
address 0xfffffffffffffff7, cause 'memory not mapped'

Traceback:
 1: 
Traceback:
 1: dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques),     names(derep[[i]]$uniques) %in% c(priors, pseudo_priors),     err, qi, opts[["MATCH"]], opts[["MISMATCH"]], opts[["GAP_PENALTY"]],     opts[["USE_KMERS"]], opts[["KDIST_CUTOFF"]], opts[["BAND_SIZE"]],     opts[["OMEGA_A"]], opts[["OMEGA_P"]], opts[["OMEGA_C"]],     if (initializeErr) {dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques),     names(derep[[i]]$uniques) %in% c(priors, pseudo_priors),     err, qi, opts[["MATCH"]], opts[["MISMATCH"]], opts[["GAP_PENALTY"]],     opts[["USE_KMERS"]], opts[["KDIST_CUTOFF"]], opts[["BAND_SIZE"]],     opts[["OMEGA_A"]], opts[["OMEGA_P"]], opts[["OMEGA_C"]],         1    } else {        opts[["MAX_CLUST"]]    }, opts[["MIN_FOLD"]], opts[["MIN_HAMMING"]], opts[["MIN_ABUNDANCE"]],     TRUE, FALSE, opts[["VECTORIZED_ALIGNMENT"]], opts[["HOMOPOLYMER_GAP_PENALTY"]],     multithread, (verbose >= 2), opts[["SSE"]], opts[["GAPLESS"]],     if (initializeErr) {        1    } else {    opts[["GREEDY"]])
 2: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,     selfConsist = TRUE, multithread = multithread, verbose = verbose,     MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)        opts[["MAX_CLUST"]]    }, opts[["MIN_FOLD"]], opts[["MIN_HAMMING"]], opts[["MIN_ABUNDANCE"]],     TRUE, FALSE, opts[["VECTORIZED_ALIGNMENT"]], opts[["HOMOPOLYMER_GAP_PENALTY"]],     multithread, (verbose >= 2), opts[["SSE"]], opts[["GAPLESS"]],     opts[["GREEDY"]])
 2: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,     selfConsist = TRUE, multithread = multithread, verbose = verbose,     MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
 3: 
 3: learnErrors(filtRs.16s, MAX_CONSIST = 40, multithread = snakemake@threads)
learnErrors(filtRs.16s, MAX_CONSIST = 40, multithread = snakemake@threads)
An irrecoverable exception occurred. R is aborting now ...
An irrecoverable exception occurred. R is aborting now ...

Traceback:
 1: dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques),     names(derep[[i]]$uniques) %in% c(priors, pseudo_priors),     err, qi, opts[["MATCH"]], opts[["MISMATCH"]], opts[["GAP_PENALTY"]],     opts[["USE_KMERS"]], opts[["KDIST_CUTOFF"]], opts[["BAND_SIZE"]],     opts[["OMEGA_A"]], opts[["OMEGA_P"]], opts[["OMEGA_C"]],     if (initializeErr) {        1    } else {        opts[["MAX_CLUST"]]    }, opts[["MIN_FOLD"]], opts[["MIN_HAMMING"]], opts[["MIN_ABUNDANCE"]],     TRUE, FALSE, opts[["VECTORIZED_ALIGNMENT"]], opts[["HOMOPOLYMER_GAP_PENALTY"]],     multithread, (verbose >= 2), opts[["SSE"]], opts[["GAPLESS"]],     opts[["GREEDY"]])
 2: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,     selfConsist = TRUE, multithread = multithread, verbose = verbose,     MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
 3: learnErrors(filtRs.16s, MAX_CONSIST = 40, multithread = snakemake@threads)
An irrecoverable exception occurred. R is aborting now ...

 *** caught segfault ***
address 0xfffffffffffffff7, cause 'memory not mapped'

Traceback:
 1: dada_uniques(names(derep[[i]]$uniques), unname(derep[[i]]$uniques),     names(derep[[i]]$uniques) %in% c(priors, pseudo_priors),     err, qi, opts[["MATCH"]], opts[["MISMATCH"]], opts[["GAP_PENALTY"]],     opts[["USE_KMERS"]], opts[["KDIST_CUTOFF"]], opts[["BAND_SIZE"]],     opts[["OMEGA_A"]], opts[["OMEGA_P"]], opts[["OMEGA_C"]],     if (initializeErr) {        1    } else {        opts[["MAX_CLUST"]]    }, opts[["MIN_FOLD"]], opts[["MIN_HAMMING"]], opts[["MIN_ABUNDANCE"]],     TRUE, FALSE, opts[["VECTORIZED_ALIGNMENT"]], opts[["HOMOPOLYMER_GAP_PENALTY"]],     multithread, (verbose >= 2), opts[["SSE"]], opts[["GAPLESS"]],     opts[["GREEDY"]])
 2: dada(drps, err = NULL, errorEstimationFunction = errorEstimationFunction,     selfConsist = TRUE, multithread = multithread, verbose = verbose,     MAX_CONSIST = MAX_CONSIST, OMEGA_C = OMEGA_C, ...)
 3: learnErrors(filtRs.16s, MAX_CONSIST = 40, multithread = snakemake@threads)
An irrecoverable exception occurred. R is aborting now ...
ebolyen commented 5 years ago

I was about to respond that we had tried that and couldn't make it stick, but looking at my notes, we were actually trying CPPFLAGS (CPPFLAGS="$($R CMD config CPPFLAGS) -flifetime-dse=1"). Oops >_<

That said, there may be other flags/configuration happening, so the missing gcc during the make setup phase has me a little nervous anyway.

ebolyen commented 5 years ago

Worth noting that if you can get RcppParallel to build correctly, DADA2 works perfectly fine. So I don't think there's any code or build changes needed for DADA2. (Unless we are completely off track here, which is possible.)

epruesse commented 5 years ago

That's great! I guess I was wrong looking into code in dada2 then. Means we need to check into everything linking TBB though. If lifetime optimizations in GCC7 break TBB, all dependencies need to make sure to use the right settings.

kguay commented 5 years ago

You guys rock. Thank you so much for your perseverance on this! Have a great weekend.

epruesse commented 5 years ago

Ah, hadn't seen that RcppParallel comes with a copy of TBB. Wonder whether that's necessary or even good. Wish there was some clearer info on whether the header library part of TBB is affected at all, or whether it's only important while building that DSO.

@ebolyen This needs to be fixed at https://github.com/conda-forge/r-rcppparallel-feedstock. With three Bioconda/core members on the package maintainer list merging the PR quickly shouldn't be an issue. The usual "how do we avoid old pakages" problem will remain though, might need conda-forge/core to help there.

benjjneb commented 5 years ago

This is really exciting progress and it looks like I should stop my unsuccessful hacking around with the multi-threaded code on my local copy of dada2.

thermokarst commented 5 years ago

@ebolyen and I will pick this up again on Tuesday and see if we can get the TBB makefiles to play nicely with our conda-supplied gcc. Stay tuned!

thermokarst commented 5 years ago

Good news! We have a working solution, PR up on conda-forge. We suspect there is a cleaner, more idiomatic way to accomplish the same result, but, this does appear to fix the problem (at least to the best of our knowledge). Stay tuned!

thermokarst commented 5 years ago

Good news, the build for r-rcppparallel was updated on conda-forge, and seems to be working as expect. Installing DADA2 using:

 conda create -n dada2-check -c conda-forge -c bioconda -c defaults --override-channels bioconductor-dada2

This picked up:

r-rcppparallel     conda-forge/linux-64::r-rcppparallel-4.4.1-r351h0357c0b_1001

in particular, build 1001 is the build with the fixes in it.

@apcamargo, @valscherz, @mworkentine, @brendanf , @pdcountway, and @kguay, if you update your envs to use this latest build of r-rcppparallel, do things work as expected? Our testing on this end looks okay... Thanks!

mworkentine commented 5 years ago

Working for me! Thanks everyone for the all the hard work!

benjjneb commented 5 years ago

Confirmed that this now runs w/o segfault for me, on the same machine and running same R code that segfaulted with the previous conda install.

pdcountway commented 5 years ago

Just confirming that I've made it past the second learnErrors step without segfaulting, with multithread=16. Thanks for all of your help!

benjjneb commented 5 years ago

I think that is enough concurrence to declare victory on this one!

Thanks again to @epruesse @ebolyen @thermokarst for tracking the cause of this down and solving it, and to @apcamargo @pdcountway @mworkentine @kguay @valscherz and others for the detailed reports that allowed that to happen.

valscherz commented 5 years ago

Hi,

Sorry to say that I have faced again the same error with dada2 1.10.0 bioconda build (r351hf484d3e_0). The installed r-rcppparallel was the r35h0357c0b_0 build which apparently just came out yesterday on conda-forge.

I will force the r-rcppparallel build which was proven to work (r-rcppparallel-4.4.1-r351h0357c0b_1001) to see if it helps and let you know.

EDIT: Indeed, adding "- r-rcppparallel==4.4.1[build=*_1001]" to my environnement definition solved the error

ebolyen commented 5 years ago

Oh gosh, thanks for the report! I'll put this on my todo list to try and confirm the new build isn't doing something silly again.

ebolyen commented 5 years ago

Looks like our original hack was removed in this PR. It also looks like there might be some new build machinery, or perhaps upstream has updated their Makefiles, but in any case I'm not too surprised we need to tinker with this again. cc @dbast thoughts on how you'd like to proceed?

ebolyen commented 5 years ago

Looks like the TBB build hasn't changed in this respect in RcppParallel, so the probing is still based on a $(shell gcc) call: https://github.com/RcppCore/RcppParallel/blob/master/src/tbb/build/linux.gcc.inc#L65

dbast commented 5 years ago

oh. sry, seems like this was overlooked in my semi-automatic recipe update procedure. Looks like this has to be brought back.

ebolyen commented 5 years ago

Ah no worries, at least it's easy to put back in that case! I was slightly worried our syntax was breaking the (new?) windows build somehow :)

ebolyen commented 5 years ago

@valscherz, would you be able to test the latest build and confirm it's working again? @dbast should have corrected the issue in the latest PR.

valscherz commented 5 years ago

Hi @ebolyen. I can confirm it worked fine using the r-rcppparallel-4.4.3-r35h0357c0b_1.tar.bz2 build.

Thanks again. Take care

Valentin

sjanssen2 commented 4 years ago

Hey guys, reading this log here was extremely helpful. Thanks for this great documentation and for all your hard work figuring out what was going wrong!

swuyts commented 4 years ago

Hi all it seems that this problem reappears in DADA2 v1.12.1, could that be possible?

I just installed a new environment using the code above:

conda create -n dada2-check -c conda-forge -c bioconda -c defaults --override-channels bioconductor-dada2

which results in the following r-rcppparallel version, that throws the segmentation fault.

r-rcppparallel            4.4.4             r36h0357c0b_0    conda-forge

Or did I overlook something?

benjjneb commented 4 years ago

I hope not. There was a brief reversion at some point but it got fixed pretty quickly if I remember correctly. However, I didn't even realize that dada2 1.12.1 was on bioconda, so it might be being built incorrectly again.