dpryan79 / MethylDackel

A (mostly) universal methylation extractor for BS-seq experiments.
MIT License
160 stars 42 forks source link

MethylDackel core dump Ubintu 18.04 (problem happend again in issue #95) #99

Closed tahuh closed 3 years ago

tahuh commented 4 years ago

Hi @dpryan79

First, I have no idea why I cannot re-open my previous issue so I'm writing problem here. (I'm sorry for that)

As you suggested in closed issue #95(https://github.com/dpryan79/MethylDackel/issues/95#issue-619857769) I have tested 0.5.1 but still in the same problem

This is the full backtrace of GDB

Thread 2 "MethylDackel" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff263f700 (LWP 29417)]
cust_mplp_auto (iter=iter@entry=0x7fffec004e70, _tid=_tid@entry=0x7ffff263ee84, _pos=_pos@entry=0x7ffff263ee88, n_plp=n_plp@entry=0x7ffff263ee94, plp=plp@entry=0x7fffec056280) at pileup.c:288
288         if (iter->pos[i] == iter->min) {
(gdb) bt full
#0  cust_mplp_auto (iter=iter@entry=0x7fffec004e70, _tid=_tid@entry=0x7ffff263ee84, _pos=_pos@entry=0x7ffff263ee88, n_plp=n_plp@entry=0x7ffff263ee94, plp=plp@entry=0x7fffec056280) at pileup.c:288
        i = 0
        ret = 0
        new_min = 18446744073709551615
#1  0x000055555555f101 in extractCalls (foo=0x7fffffffde60) at extract.c:391
        config = 0x7fffffffde60
        hdr = 0x7fffec003fc0
        iter = 0x7fffec004e70
        ret = <optimized out>
        tid = 0
        pos = 0
        i = <optimized out>
        seqlen = 1000012
        type = <optimized out>
        rv = <optimized out>
        o = <optimized out>
        bedIdx = 0
        n_plp = 0
        strand = <optimized out>
        direction = <optimized out>
        nmethyl = <optimized out>
        nunmethyl = <optimized out>
        nOff = <optimized out>
        nVariant = <optimized out>
        localBin = 0
        localPos = 0
        localEnd = 1000001
        localTid = 0
        localPos2 = 0
        lastPos = 0
        nVariantPositions = 0
        plp = 0x7fffec056280
        seq = 0x7ffff1d4a010 'N' <repeats 200 times>...
        base = 65 'A'
        context = "HG"
        tnc = <optimized out>
        data = 0x7fffec056250
        lastCpG = 0x0
        lastCHG = 0x0
        os = 0x7fffec000b60
        os_CpG = 0x7fffec000b80
        os_CHG = 0x7fffec000ba0
        os_CHH = 0x7fffec000bc0
---Type <return> to continue, or q <return> to quit---
        fai = 0x7fffec001db0
        bai = 0x7fffec0050a0
        fp = 0x7fffec013a70
        __PRETTY_FUNCTION__ = "extractCalls"
#2  0x00007ffff7c29669 in start_thread (arg=<optimized out>) at pthread_create.c:479
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737260025600, -269718404055552016, 140737488346494, 140737488346495, 140737488346496, 140737260023744, 269688545975873520, 269700372400975856}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#3  0x00007ffff7614323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.

If you have pre-compiled binary of stable version, can you let me know?

Thanks.

dpryan79 commented 4 years ago

@tahuh Please use a 0.4 release

tahuh commented 4 years ago

@dpryan79

I have use 0.4 release as you suggested and still have same problem.

The GDB trace keep saying if (iter->pos[i] == iter->min) in pileup.c has problem.

My guess is iter->pos[i] is some invalid behavior(maybe iter->pos is not allocated). Does regions with empty reads can cause this problem? Since I have removed reads aligned on low mappabiltity regions from my bam file generating empty region in genome.

Version Check

MethylDackel: A tool for processing bisulfite sequencing alignments.
Version: 0.4.0 (using HTSlib version 1.10.2-51-g0bff7c1)
Usage: MethylDackel <command> [options]

Commands:
    mbias    Determine the position-dependent methylation bias in a dataset,
             producing diagnostic SVG images.
    extract  Extract methylation metrics from an alignment file in BAM/CRAM
             format.
    mergeContext   Combine single Cytosine metrics from 'MethylDackel extract' into
             per-CpG/CHG metrics.
    perRead  Generate a per-read methylation summary.

Error message in GDB

Starting program: /usr/local/bin/MethylDackel extract --CHG --CHH --cytosine_report /synbiodata/reference/ensemble/GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa AJS-EM20.sort.bam
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
writing to prefix:'AJS-EM20.sort'
[New Thread 0x7ffff69d6700 (LWP 16684)]

Thread 2 "MethylDackel" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff69d6700 (LWP 16684)]
cust_mplp_auto (iter=iter@entry=0x7ffff00644b0, _tid=_tid@entry=0x7ffff69d5e84, _pos=_pos@entry=0x7ffff69d5e88, n_plp=n_plp@entry=0x7ffff69d5e94, plp=plp@entry=0x7ffff0063870) at pileup.c:288
288         if (iter->pos[i] == iter->min) {
(gdb) bt full
#0  cust_mplp_auto (iter=iter@entry=0x7ffff00644b0, _tid=_tid@entry=0x7ffff69d5e84, _pos=_pos@entry=0x7ffff69d5e88, n_plp=n_plp@entry=0x7ffff69d5e94, plp=plp@entry=0x7ffff0063870) at pileup.c:288
        i = 0
        ret = 0
        new_min = 18446744073709551615
#1  0x000055555555e671 in extractCalls (foo=0x7fffffffdf10) at extract.c:385
        config = 0x7fffffffdf10
        hdr = 0x7ffff00162a0
        iter = 0x7ffff00644b0
        ret = <optimized out>
        tid = 0
        pos = 0
        i = <optimized out>
        seqlen = 1000012
        type = <optimized out>
        rv = <optimized out>
        o = <optimized out>
        bedIdx = 0
        n_plp = 0
        strand = <optimized out>
        direction = <optimized out>
        nmethyl = <optimized out>
        nunmethyl = <optimized out>
        nOff = <optimized out>
        nVariant = <optimized out>
        localBin = 0
        localPos = 0
        localEnd = 1000001
        localTid = 0
        localPos2 = 0
        lastPos = 0
        nVariantPositions = 0
        plp = 0x7ffff0063870
        seq = 0x7ffff588c010 'N' <repeats 200 times>...
        base = 65 'A'
        context = "HG"
        tnc = <optimized out>
        data = 0x7ffff0063840
        lastCpG = 0x0
        lastCHG = 0x0
        os = 0x7ffff0000b60
        os_CpG = 0x7ffff0000b80
        os_CHG = 0x7ffff0000ba0
        os_CHH = 0x7ffff0000bc0
---Type <return> to continue, or q <return> to quit---
        fai = 0x7ffff0001db0
        bai = 0x7ffff0017380
        fp = 0x7ffff0013880
        __PRETTY_FUNCTION__ = "extractCalls"
#2  0x00007ffff7c29669 in start_thread (arg=<optimized out>) at pthread_create.c:479
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737330898688, -5472056830547680361, 140737488346718, 140737488346719, 140737488346720, 140737330896832, 5472071960625456023, 5472074530859426711}, 
              mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#3  0x00007ffff7893323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
(gdb) 
dpryan79 commented 4 years ago

@tahuh Can you privately make that BAM file available? That will make the cause of this easier to debug. I suspect there's a buffer overflow in the code somewhere that's triggering this.

tahuh commented 4 years ago

@dpryan79

Sorry for late reply.

Due to the large file size of the bam file I have (~37.5G), I can not directly transfer data for you.

Is there any options that you suggest?

dpryan79 commented 4 years ago

@tahuh Can you reproduce this with a subset of the data? Maybe a single chromosome or even a couple megabases in it?

MarWoes commented 4 years ago

I'm also experiencing this issue for a toy dataset that contains subsampled reads from chromosome 22 with only chr22.fasta as reference (MethylDackel 0.5.1 from Bioconda on Debian 9, version 0.4.0 runs just fine).

I uploaded the test data here: https://uni-muenster.sciebo.de/s/gwAgkrO8JxhszLP

Thanks for developing MethylDackel, I hope that test data helps reproducing the error!

dpryan79 commented 4 years ago

Thanks @MarWoes I'll have a look at this later today!

dpryan79 commented 4 years ago

I can completely reproduce this with the binary distributed by conda, but oddly enough it doesn't happen with a regular local build. I'll make a new conda-built binary locally and track down the cause of this.

robertamezquita commented 4 years ago

Any update on this? Also encountering this same issue..

robertamezquita commented 4 years ago

Seems to potentially be an issue with HTSlib 1.10.2; HTSlib 1.9 works

tahuh commented 4 years ago

@robertamezquita

I tried as you mentioned and it works fine for me too.

I'm wondering why htslib 1.10 does not work but any way, I'm fine with this.

mmark27 commented 4 years ago

@robertamezquita

I tried as you mentioned and it works fine for me too.

I'm wondering why htslib 1.10 does not work but any way, I'm fine with this.

How do I specify the HTSlib to use? I used the bioconda dist installation. Running into the same error on the HTSlib 1.10. Do I need to uninstall the bioconda dist and do a local install of MethylDackel?

tahuh commented 4 years ago

@mmark27

I had compiled from the source and specified the htslib version (htslib-1.9) I wanted to use.

You do not have to uninstall the bioconda but try docker instead.

I made my own docker container and was successful.

@dpryan79 Do you have any plan of publishing MethylDackel on DockerHub?

Blosberg commented 4 years ago

I had compiled from the source and specified the htslib version (htslib-1.9) I wanted to use.

@tahuh , can you elaborate on what that entails? the steps I've tried are listed below: @dpryan79 , I'm trying to follow the procedure on the README,

----Edit tidied up this comment and cleared out my description of failed attempt to get things working with htslib-1.11 in order to focus on the problems with recommend htslib-1.9 and release 0.4.0 ----

I checked out commit 344b1be to have a 0.4.0 release, as recommended above, set PATH_HTS to my htslib-1.9 installation, and PATH_BIGWIG to my libBigWig installation. Compilation as follows:

make CFLAGS="-I${PATH_HTS}  -I${PATH_BIGWIG}"  LIBS="-L${PATH_HTS} -L${PATH_BIGWIG}" LIBBIGWIG=${PATH_BIGWIG}"libBigWig.a" prefix=$(pwd)
echo '#define VERSION "0.4.0"' > version.h
cc -I/[PATH_HTS]/  -I/[PATH_BIGWIG]/ -L/[PATH_HTS]/ -L/[PATH_BIGWIG]/ -o MethylDackel common.o bed.o svg.o pileup.o extract.o MBias.o mergeContext.o perRead.o main.c -lm -lz -lpthread -lhts

seems to compile, but then upon trying to run:

./MethylDackel: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory

Out of ideas at this point. If anybody has any suggestions I'd appreciate whatever help you can offer.

tahuh commented 4 years ago

@Blosberg

To be clear, I have created a docker image for MethylDackel.

You can find the docker file here .

Blosberg commented 4 years ago

To be clear, I have created a docker image for MethylDackel. You can find the docker file here .

Thanks for that. My alternative solution was to specify htslib=1.9 in my conda recipe. Specifying 1.11 lead to some strange behaviour with MethylDackel being built as Version: 0.3.0 (using HTSlib version 1.2.1) --which I didn't want to trust, but 1.9 seems to have resolved it. Thanks for your help.

ewels commented 3 years ago

Hi all,

Running into a similar issue trying to update Methyldackel to version 1.5.1 - (https://github.com/nf-core/methylseq/pull/185) see CI nextflow log:

Command executed:

  MethylDackel extract     genome.fa SRR389222_sub1.sorted.markDups.bam
  MethylDackel mbias   genome.fa SRR389222_sub1.sorted.markDups.bam SRR389222_sub1.sorted.markDups --txt > SRR389222_sub1.sorted.markDups_methyldackel.txt

Command exit status:
  139

Command output:
  (empty)

Command error:
  writing to prefix:'SRR389222_sub1.sorted.markDups'
  .command.sh: line 2:    27 Segmentation fault      (core dumped) MethylDackel extract genome.fa SRR389222_sub1.sorted.markDups.bam

I tried downgrading htslib to 1.9 as suggested here, but then my environment won't build - CI conda log here or a more readable output from mamba:

Problem: package methyldackel-0.5.1-h4eb3cf5_0 requires htslib >=1.10.2,<1.11.0a0, but none of the providers can be installed

We previously had Methyldackel 0.5.0 working fine, before I start downgrading all the packages in the environment again in an effort to get it back to a working state I wondered if any progress had been made on tracking down this error?

Cheers,

Phil

dpryan79 commented 3 years ago

@ewels Is that CI run on Ubuntu 20.04 and are the commands and input files available somewhere (this being nf-core, I guess I should really know that the answer is "of course the files are available!")? I have Ubuntu 18.04 locally, but could update the system if needed.

What happens if you downgrade MethylDackel to 0.4.0? The options should be identical, so it should be a drop-in replacement. That version predates some large contributed changes that seem to at least sometimes be the cause of segfaults (though I still have yet to track down exactly why/where).

ewels commented 3 years ago

Of course the files are available! 😆

Trying with downgrading just MethylDackel now (see CI test).

The last PR to be merged had tests passing here and that was with MethylDackel 0.5.0 (see env). So there's something odd going on here, either a change in 0.5.1 or more likely a change in a dependency / something upstream as I changed a bunch of other packages in the problematic PR...

jmarshall commented 3 years ago

In HTSlib 1.10, __bam_mplp_t acquired some extra fields (see samtools/htslib@eb2334e6e895982ba280dedc1a3884547089fd43), so you would need to apply the following to MethylDackel's pileup.c to work with current HTSlib (the struct tag renaming is required for 1.11 onwards):

diff --git a/pileup.c b/pileup.c
index 95ec767..b718b16 100644
--- a/pileup.c
+++ b/pileup.c
@@ -34,7 +34,7 @@ typedef struct {
 KHASH_MAP_INIT_STR(olap_hash, lbnode_t *)
 typedef khash_t(olap_hash) olap_hash_t;

-struct __bam_plp_t {
+struct bam_plp_s {
     mempool_t *mp;
     lbnode_t *head, *tail;
     int32_t tid, pos, max_tid, max_pos;
@@ -51,8 +51,9 @@ struct __bam_plp_t {
     int (*plp_destruct )(void *data, const bam1_t *b, bam_pileup_cd *cd);
 };

-struct __bam_mplp_t {
+struct bam_mplp_s {
     int n;
+    int32_t min_tid, *tid;
     uint64_t min, *pos;
     bam_plp_t *iter;
     int *n_plp;

This resolves the segfault when run on @MarWoes's test data from https://github.com/dpryan79/MethylDackel/issues/99#issuecomment-648914559.

You could I suppose wrap that in #if HTS_VERSION >= 101000 etc to keep it working with both older and newer HTSlib, but really trying to keep up with changes in HTSlib's internal structs is a fool's errand.

I didn't look into why you have your own pileup routines here, but if you were lucky the newer plp constructor stuff or other flexibility improvements might mean you didn't need to do that anymore; otherwise, it would be better to duplicate the pileup code entirely so that you were decoupled from changes in HTSlib's internal pileup code.

(The OP could not reopen their previous issue due to isaacs/github#583.)

dpryan79 commented 3 years ago

@jmarshall I think you've nicely pin-pointed the cause. I need to refresh my memory about exactly why I was doing what I was with pileup.c and either just redo that or fully copy the relevant portion from htslib. It's been a few years since I've looked at much depth at the code base here, so that's a good excuse to do some mental refreshing :)

dpryan79 commented 3 years ago

@ewels Until I have a proper fix for this supporting htslib 1.10 and 1.11 I've made a new build of MethylDackel 0.5.1 pinned to htslib 1.9. That should be available in bioconda in an hour or so, can you try that with the nextflow pipeline?

ewels commented 3 years ago

Still no joy 😞

https://github.com/nf-core/methylseq/pull/185/checks?check_run_id=1935046499#step:7:111

Error executing process > 'methyldackel (SRR389222_sub1)'

Caused by:
  Process `methyldackel (SRR389222_sub1)` terminated with an error exit status (139)

Command executed:

  MethylDackel extract     genome.fa SRR389222_sub1.sorted.markDups.bam
  MethylDackel mbias   genome.fa SRR389222_sub1.sorted.markDups.bam SRR389222_sub1.sorted.markDups --txt > SRR389222_sub1.sorted.markDups_methyldackel.txt

Command exit status:
  139

Command output:
  (empty)

Command error:
  writing to prefix:'SRR389222_sub1.sorted.markDups'
  .command.sh: line 2:    26 Segmentation fault      (core dumped) MethylDackel extract genome.fa SRR389222_sub1.sorted.markDups.bam

Ah but it looks like it's building with htslib-1.10.2 still though. Did I not wait long enough?

https://github.com/nf-core/methylseq/pull/185/checks?check_run_id=1935046499#step:4:142

ewels commented 3 years ago

Looking again, I can see that methyldackel-0.5.1-hee625c5_0.tar.bz2 was used and not the new 0.5.1--hed50d52_1 so I'm pretty sure that this is what the issue was. I'll try restarting the jobs this evening.

ewels commented 3 years ago

Ok I tried pinning the build as well as the version but then the environment fails. I can never read the conda failures but I guess that there is a conflict in one of the other packages. I've tried removing the version pin from samtools.

Soon with Nextflow DSL2 we will move to 1 process = 1 container / environment. Going to make this stuff so much easier! 😅

dpryan79 commented 3 years ago

Oof, I'd forgotten that the DSL1 didn't allow multiple containers, that's a real pain. I'll take this as some pressure to get things properly fixed (I originally tried the patch above, which does cause it to run through, but then the CI testing failed on it due to the output being wrong, so there's more to fix).

ewels commented 3 years ago

DSL1 does allow multiple containers, we just chose not to go that way at first with @nf-core pipelines (smaller total singularity size, ability to collect versions for all tools in a single process). But we're totally rewriting for DSL2 to have shared software wrappers and will change this approach.

After a lot of trial and error (pushing a tonne of commits and letting the CI test them all in parallel actually worked quite well) I think I got to a working environment though! Had to keep Samtools at 1.9 and preseq at v2 but managed to get everything else up to date and working now I think 🎉

dpryan79 commented 3 years ago

Great, that's something at least.

dpryan79 commented 3 years ago

Version 0.5.2 contains a reimplementation of the paired-end overlap detection functionality (this is slightly different than what comes with htslib, since it needs to take BS-seq strandedness into account), which was the root cause of the incompatibility with htslib 1.10+. The constructor/destructor functionality pointed out by @jmarshall really helped to clean things up in this regard. Note that 0.5.2 is currently only compatible with htslib 1.11 and not earlier versions, due to https://github.com/samtools/htslib/pull/1127. I hope this issue can now be closed for good :)

ewels commented 3 years ago

Fantastic, thank you @dpryan79 !