Closed tahuh closed 3 years ago
@tahuh Please use a 0.4 release
@dpryan79
I have use 0.4 release as you suggested and still have same problem.
The GDB trace keep saying if (iter->pos[i] == iter->min)
in pileup.c
has problem.
My guess is iter->pos[i] is some invalid behavior(maybe iter->pos is not allocated). Does regions with empty reads can cause this problem? Since I have removed reads aligned on low mappabiltity regions from my bam file generating empty region in genome.
Version Check
MethylDackel: A tool for processing bisulfite sequencing alignments.
Version: 0.4.0 (using HTSlib version 1.10.2-51-g0bff7c1)
Usage: MethylDackel <command> [options]
Commands:
mbias Determine the position-dependent methylation bias in a dataset,
producing diagnostic SVG images.
extract Extract methylation metrics from an alignment file in BAM/CRAM
format.
mergeContext Combine single Cytosine metrics from 'MethylDackel extract' into
per-CpG/CHG metrics.
perRead Generate a per-read methylation summary.
Error message in GDB
Starting program: /usr/local/bin/MethylDackel extract --CHG --CHH --cytosine_report /synbiodata/reference/ensemble/GRCh37/Homo_sapiens.GRCh37.dna.primary_assembly.fa AJS-EM20.sort.bam
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
writing to prefix:'AJS-EM20.sort'
[New Thread 0x7ffff69d6700 (LWP 16684)]
Thread 2 "MethylDackel" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff69d6700 (LWP 16684)]
cust_mplp_auto (iter=iter@entry=0x7ffff00644b0, _tid=_tid@entry=0x7ffff69d5e84, _pos=_pos@entry=0x7ffff69d5e88, n_plp=n_plp@entry=0x7ffff69d5e94, plp=plp@entry=0x7ffff0063870) at pileup.c:288
288 if (iter->pos[i] == iter->min) {
(gdb) bt full
#0 cust_mplp_auto (iter=iter@entry=0x7ffff00644b0, _tid=_tid@entry=0x7ffff69d5e84, _pos=_pos@entry=0x7ffff69d5e88, n_plp=n_plp@entry=0x7ffff69d5e94, plp=plp@entry=0x7ffff0063870) at pileup.c:288
i = 0
ret = 0
new_min = 18446744073709551615
#1 0x000055555555e671 in extractCalls (foo=0x7fffffffdf10) at extract.c:385
config = 0x7fffffffdf10
hdr = 0x7ffff00162a0
iter = 0x7ffff00644b0
ret = <optimized out>
tid = 0
pos = 0
i = <optimized out>
seqlen = 1000012
type = <optimized out>
rv = <optimized out>
o = <optimized out>
bedIdx = 0
n_plp = 0
strand = <optimized out>
direction = <optimized out>
nmethyl = <optimized out>
nunmethyl = <optimized out>
nOff = <optimized out>
nVariant = <optimized out>
localBin = 0
localPos = 0
localEnd = 1000001
localTid = 0
localPos2 = 0
lastPos = 0
nVariantPositions = 0
plp = 0x7ffff0063870
seq = 0x7ffff588c010 'N' <repeats 200 times>...
base = 65 'A'
context = "HG"
tnc = <optimized out>
data = 0x7ffff0063840
lastCpG = 0x0
lastCHG = 0x0
os = 0x7ffff0000b60
os_CpG = 0x7ffff0000b80
os_CHG = 0x7ffff0000ba0
os_CHH = 0x7ffff0000bc0
---Type <return> to continue, or q <return> to quit---
fai = 0x7ffff0001db0
bai = 0x7ffff0017380
fp = 0x7ffff0013880
__PRETTY_FUNCTION__ = "extractCalls"
#2 0x00007ffff7c29669 in start_thread (arg=<optimized out>) at pthread_create.c:479
ret = <optimized out>
pd = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737330898688, -5472056830547680361, 140737488346718, 140737488346719, 140737488346720, 140737330896832, 5472071960625456023, 5472074530859426711},
mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = 0
#3 0x00007ffff7893323 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
(gdb)
@tahuh Can you privately make that BAM file available? That will make the cause of this easier to debug. I suspect there's a buffer overflow in the code somewhere that's triggering this.
@dpryan79
Sorry for late reply.
Due to the large file size of the bam file I have (~37.5G), I can not directly transfer data for you.
Is there any options that you suggest?
@tahuh Can you reproduce this with a subset of the data? Maybe a single chromosome or even a couple megabases in it?
I'm also experiencing this issue for a toy dataset that contains subsampled reads from chromosome 22 with only chr22.fasta as reference (MethylDackel 0.5.1 from Bioconda on Debian 9, version 0.4.0 runs just fine).
I uploaded the test data here: https://uni-muenster.sciebo.de/s/gwAgkrO8JxhszLP
Thanks for developing MethylDackel, I hope that test data helps reproducing the error!
Thanks @MarWoes I'll have a look at this later today!
I can completely reproduce this with the binary distributed by conda, but oddly enough it doesn't happen with a regular local build. I'll make a new conda-built binary locally and track down the cause of this.
Any update on this? Also encountering this same issue..
Seems to potentially be an issue with HTSlib 1.10.2; HTSlib 1.9 works
@robertamezquita
I tried as you mentioned and it works fine for me too.
I'm wondering why htslib 1.10 does not work but any way, I'm fine with this.
@robertamezquita
I tried as you mentioned and it works fine for me too.
I'm wondering why htslib 1.10 does not work but any way, I'm fine with this.
How do I specify the HTSlib to use? I used the bioconda dist installation. Running into the same error on the HTSlib 1.10. Do I need to uninstall the bioconda dist and do a local install of MethylDackel?
@mmark27
I had compiled from the source and specified the htslib version (htslib-1.9
) I wanted to use.
You do not have to uninstall the bioconda but try docker
instead.
I made my own docker container and was successful.
@dpryan79 Do you have any plan of publishing MethylDackel on DockerHub?
I had compiled from the source and specified the htslib version (
htslib-1.9
) I wanted to use.
@tahuh , can you elaborate on what that entails? the steps I've tried are listed below: @dpryan79 , I'm trying to follow the procedure on the README,
----Edit tidied up this comment and cleared out my description of failed attempt to get things working with htslib-1.11 in order to focus on the problems with recommend htslib-1.9 and release 0.4.0 ----
I checked out commit 344b1be
to have a 0.4.0 release, as recommended above, set PATH_HTS to my htslib-1.9 installation, and PATH_BIGWIG to my libBigWig installation. Compilation as follows:
make CFLAGS="-I${PATH_HTS} -I${PATH_BIGWIG}" LIBS="-L${PATH_HTS} -L${PATH_BIGWIG}" LIBBIGWIG=${PATH_BIGWIG}"libBigWig.a" prefix=$(pwd)
echo '#define VERSION "0.4.0"' > version.h
cc -I/[PATH_HTS]/ -I/[PATH_BIGWIG]/ -L/[PATH_HTS]/ -L/[PATH_BIGWIG]/ -o MethylDackel common.o bed.o svg.o pileup.o extract.o MBias.o mergeContext.o perRead.o main.c -lm -lz -lpthread -lhts
seems to compile, but then upon trying to run:
./MethylDackel: error while loading shared libraries: libhts.so.2: cannot open shared object file: No such file or directory
Out of ideas at this point. If anybody has any suggestions I'd appreciate whatever help you can offer.
@Blosberg
To be clear, I have created a docker image for MethylDackel.
You can find the docker file here .
To be clear, I have created a docker image for MethylDackel. You can find the docker file here .
Thanks for that. My alternative solution was to specify htslib=1.9
in my conda recipe. Specifying 1.11 lead to some strange behaviour with MethylDackel being built as Version: 0.3.0 (using HTSlib version 1.2.1)
--which I didn't want to trust, but 1.9 seems to have resolved it. Thanks for your help.
Hi all,
Running into a similar issue trying to update Methyldackel to version 1.5.1 - (https://github.com/nf-core/methylseq/pull/185) see CI nextflow log:
Command executed:
MethylDackel extract genome.fa SRR389222_sub1.sorted.markDups.bam
MethylDackel mbias genome.fa SRR389222_sub1.sorted.markDups.bam SRR389222_sub1.sorted.markDups --txt > SRR389222_sub1.sorted.markDups_methyldackel.txt
Command exit status:
139
Command output:
(empty)
Command error:
writing to prefix:'SRR389222_sub1.sorted.markDups'
.command.sh: line 2: 27 Segmentation fault (core dumped) MethylDackel extract genome.fa SRR389222_sub1.sorted.markDups.bam
I tried downgrading htslib to 1.9 as suggested here, but then my environment won't build - CI conda
log here or a more readable output from mamba
:
Problem: package methyldackel-0.5.1-h4eb3cf5_0 requires htslib >=1.10.2,<1.11.0a0, but none of the providers can be installed
We previously had Methyldackel 0.5.0 working fine, before I start downgrading all the packages in the environment again in an effort to get it back to a working state I wondered if any progress had been made on tracking down this error?
Cheers,
Phil
@ewels Is that CI run on Ubuntu 20.04 and are the commands and input files available somewhere (this being nf-core, I guess I should really know that the answer is "of course the files are available!")? I have Ubuntu 18.04 locally, but could update the system if needed.
What happens if you downgrade MethylDackel to 0.4.0? The options should be identical, so it should be a drop-in replacement. That version predates some large contributed changes that seem to at least sometimes be the cause of segfaults (though I still have yet to track down exactly why/where).
Of course the files are available! 😆
Trying with downgrading just MethylDackel now (see CI test).
The last PR to be merged had tests passing here and that was with MethylDackel 0.5.0 (see env). So there's something odd going on here, either a change in 0.5.1 or more likely a change in a dependency / something upstream as I changed a bunch of other packages in the problematic PR...
In HTSlib 1.10, __bam_mplp_t
acquired some extra fields (see samtools/htslib@eb2334e6e895982ba280dedc1a3884547089fd43), so you would need to apply the following to MethylDackel's pileup.c to work with current HTSlib (the struct tag renaming is required for 1.11 onwards):
diff --git a/pileup.c b/pileup.c
index 95ec767..b718b16 100644
--- a/pileup.c
+++ b/pileup.c
@@ -34,7 +34,7 @@ typedef struct {
KHASH_MAP_INIT_STR(olap_hash, lbnode_t *)
typedef khash_t(olap_hash) olap_hash_t;
-struct __bam_plp_t {
+struct bam_plp_s {
mempool_t *mp;
lbnode_t *head, *tail;
int32_t tid, pos, max_tid, max_pos;
@@ -51,8 +51,9 @@ struct __bam_plp_t {
int (*plp_destruct )(void *data, const bam1_t *b, bam_pileup_cd *cd);
};
-struct __bam_mplp_t {
+struct bam_mplp_s {
int n;
+ int32_t min_tid, *tid;
uint64_t min, *pos;
bam_plp_t *iter;
int *n_plp;
This resolves the segfault when run on @MarWoes's test data from https://github.com/dpryan79/MethylDackel/issues/99#issuecomment-648914559.
You could I suppose wrap that in #if HTS_VERSION >= 101000
etc to keep it working with both older and newer HTSlib, but really trying to keep up with changes in HTSlib's internal structs is a fool's errand.
I didn't look into why you have your own pileup routines here, but if you were lucky the newer plp constructor stuff or other flexibility improvements might mean you didn't need to do that anymore; otherwise, it would be better to duplicate the pileup code entirely so that you were decoupled from changes in HTSlib's internal pileup code.
(The OP could not reopen their previous issue due to isaacs/github#583.)
@jmarshall I think you've nicely pin-pointed the cause. I need to refresh my memory about exactly why I was doing what I was with pileup.c and either just redo that or fully copy the relevant portion from htslib. It's been a few years since I've looked at much depth at the code base here, so that's a good excuse to do some mental refreshing :)
@ewels Until I have a proper fix for this supporting htslib 1.10 and 1.11 I've made a new build of MethylDackel 0.5.1 pinned to htslib 1.9. That should be available in bioconda in an hour or so, can you try that with the nextflow pipeline?
Still no joy 😞
https://github.com/nf-core/methylseq/pull/185/checks?check_run_id=1935046499#step:7:111
Error executing process > 'methyldackel (SRR389222_sub1)'
Caused by:
Process `methyldackel (SRR389222_sub1)` terminated with an error exit status (139)
Command executed:
MethylDackel extract genome.fa SRR389222_sub1.sorted.markDups.bam
MethylDackel mbias genome.fa SRR389222_sub1.sorted.markDups.bam SRR389222_sub1.sorted.markDups --txt > SRR389222_sub1.sorted.markDups_methyldackel.txt
Command exit status:
139
Command output:
(empty)
Command error:
writing to prefix:'SRR389222_sub1.sorted.markDups'
.command.sh: line 2: 26 Segmentation fault (core dumped) MethylDackel extract genome.fa SRR389222_sub1.sorted.markDups.bam
Ah but it looks like it's building with htslib-1.10.2
still though. Did I not wait long enough?
https://github.com/nf-core/methylseq/pull/185/checks?check_run_id=1935046499#step:4:142
Looking again, I can see that methyldackel-0.5.1-hee625c5_0.tar.bz2
was used and not the new 0.5.1--hed50d52_1
so I'm pretty sure that this is what the issue was. I'll try restarting the jobs this evening.
Ok I tried pinning the build as well as the version but then the environment fails. I can never read the conda failures but I guess that there is a conflict in one of the other packages. I've tried removing the version pin from samtools.
Soon with Nextflow DSL2 we will move to 1 process = 1 container / environment. Going to make this stuff so much easier! 😅
Oof, I'd forgotten that the DSL1 didn't allow multiple containers, that's a real pain. I'll take this as some pressure to get things properly fixed (I originally tried the patch above, which does cause it to run through, but then the CI testing failed on it due to the output being wrong, so there's more to fix).
DSL1 does allow multiple containers, we just chose not to go that way at first with @nf-core pipelines (smaller total singularity size, ability to collect versions for all tools in a single process). But we're totally rewriting for DSL2 to have shared software wrappers and will change this approach.
After a lot of trial and error (pushing a tonne of commits and letting the CI test them all in parallel actually worked quite well) I think I got to a working environment though! Had to keep Samtools at 1.9 and preseq at v2 but managed to get everything else up to date and working now I think 🎉
Great, that's something at least.
Version 0.5.2 contains a reimplementation of the paired-end overlap detection functionality (this is slightly different than what comes with htslib, since it needs to take BS-seq strandedness into account), which was the root cause of the incompatibility with htslib 1.10+. The constructor/destructor functionality pointed out by @jmarshall really helped to clean things up in this regard. Note that 0.5.2 is currently only compatible with htslib 1.11 and not earlier versions, due to https://github.com/samtools/htslib/pull/1127. I hope this issue can now be closed for good :)
Fantastic, thank you @dpryan79 !
Hi @dpryan79
First, I have no idea why I cannot re-open my previous issue so I'm writing problem here. (I'm sorry for that)
As you suggested in closed issue #95(https://github.com/dpryan79/MethylDackel/issues/95#issue-619857769) I have tested 0.5.1 but still in the same problem
This is the full backtrace of GDB
If you have pre-compiled binary of stable version, can you let me know?
Thanks.