Open CrazyHsu opened 4 years ago
In the output you provided, the last print is for "statistic:" and it doesn't reach the print for "novel:" so it seems that the segfault is happening in the detect_novel function: https://github.com/Xinglab/rmats-turbo/blob/master/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3070
I don't know specifically what is causing the segfault. Are you able to examine the core file with gdb
or reproduce the error while running under gdb
?
Yes, I will try gdb
you mentioned above, and I also found that some one have came across the same problem, they have discussed in https://groups.google.com/forum/#!topic/rmats-user-group/aewShbUjdcI
, but it seemed like they didn't have a clear conclusion. Besides, it is necessary to say that my research species is wheat whose genome size is very large, which may be the causal to the segfault!
Hi @EricKutschera , Sorry i am unfamiliar with gcc, although i have read the usage of gdb, i still can't know how to use it on rmats and reproduce the error. When i look for the causal of the problem on the internet, i found that many people think it may be the program have accessed the memory location which is not allowed to be accessed. I think it may be the genome size of wheat is too large which leads to the segment fault in Cython.
I have encountered the segmentation fault (core dumped) problem before, but in a later step (post step). And the reason is that the program exceed the maximum memory. Based on your output, it seems that this segmentation fault occurred during bam processing. Other than requiring larger memory, another possible solution I can think of is to run 'prep' step for each BAM files separately, if you have multiple BAMs in b1/b2.
I am having a similar problem: running on more than 1 thread results in a segmentation fault. The stdout after I start rmats-turbo is below. Running on 1 thread is not ideal, so I'm hopeful there is a solution.
gtf: 31.38361096382141
There are 60498 distinct gene ID in the gtf file
There are 198619 distinct transcript ID in the gtf file
There are 38759 one-transcript genes in the gtf file
There are 1173255 exons in the gtf file
There are 27204 one-exon transcripts in the gtf file
There are 25061 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.283067
Average number of exons per transcript is 5.907063
Average number of exons per transcript excluding one-exon tx is 6.685827
Average number of gene per geneGroup is 7.613305
statistic: 0.05198049545288086
novel: 2472.4337520599365
The splicing graph and candidate read have been saved into /tmp/2020-09-09-13:06:48_212692.rmats
save: 18.24647068977356
loadsg: 2.350325107574463
==========
Done processing each gene from dictionary to compile AS events
Found 93902 exon skipping events
Found 13344 exon MX events
Found 16344 alt SS events
There are 9905 alt 3 SS events and 6439 alt 5 SS events.
Found 6073 RI events
==========
ase: 4.146276473999023
Segmentation fault (core dumped)
I'm also getting consistent segfaults with the newest rmats-turbo version uniquely when including the --novelSS
parameter.
command I used:
python rmats.py --b1 '/media/nus/AMLab_Ext1/Shen/splicing/try/Wr60_DS.txt' --b2 '/media/nus/AMLab_Ext1/Shen/splicing/try/Wr60_WS.txt' --gtf '/media/nus/AMLab_Ext1/Shen/splicing/try/converted.gtf' -t single --readLength 150 --variable-read-length --nthread 12 --novelSS --od '/media/nus/AMLab_Ext1/Shen/splicing/try/output' --tmp '/media/nus/AMLab_Ext1/Shen/splicing/try/temp_output'
I got this in a later step:
==========
Done processing each gene from dictionary to compile AS events
Found 22096 exon skipping events
Found 8794 exon MX events
Found 45794 alt SS events
There are 23930 alt 3 SS events and 21864 alt 5 SS events.
Found 3045 RI events
==========
ase: 4.793232679367065
Segmentation fault (core dumped)
It could be that the segfault occurs because there is not enough memory available on the machine. You could try watching the memory usage (maybe with htop
) while rmats is running and you might be able to see the memory increasing
You could also try running under gdb
so that you can get a stack trace when the segfault occurs. You could try to use gdb
as described in this comment: https://github.com/Xinglab/rmats-turbo/issues/62#issuecomment-737235215
If insufficient memory turns out to be the issue, then you can try running rmats in a way that uses memory more efficiently. rmats includes a prep and a post step. In the prep step each thread processes 1 BAM file at a time. The prep step creates 1 or more .rmats
files. In the post step each thread processes 1 .rmats
file at a time. If you use rmats code after this change https://github.com/Xinglab/rmats-turbo/pull/53 then each BAM file will have its own .rmats
file. Before that change there was 1 .rmats
file per prep step. By default rmats runs with --task both
which runs a single prep step immediately followed by a post step. You can make things more efficient by following https://github.com/Xinglab/rmats-turbo#running-prep-and-post-separately but running 1 prep step per BAM file so that there is 1 .rmats
file per BAM even without https://github.com/Xinglab/rmats-turbo/pull/53
If there are multiple files that can be processed concurrently by multiple threads, then more threads will use more memory. You can choose a smaller value for --nthread
to reduce the memory usage
The --novelSS
option allows rmats to check for additional events, but it may also increase the memory usage
I will try to go back through my analysis in a few weeks and troubleshoot with gdb. Also, just to note, I originally ran this on a machine with almost 2TB of RAM.
I also had similar problems in the same steps,this is my gdb debugging return
Starting program: /home/xxx/anaconda3/envs/rmats/bin/python ./rmats.py --b1 ./control.txt --b2 ./sh.txt --nthread 8 --variable-read-length -t paired --gtf /home/xxx/data/raw_ref/hg38_ucsc.annotated.gtf --od ./splicing_out/ --tmp ./temp [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". gtf: 24.5085580349 There are 51276 distinct gene ID in the gtf file There are 52638 distinct transcript ID in the gtf file There are 50570 one-transcript genes in the gtf file There are 542154 exons in the gtf file There are 4654 one-exon transcripts in the gtf file There are 3903 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 1.026562 Average number of exons per transcript is 10.299669 Average number of exons per transcript excluding one-exon tx is 11.201651 Average number of gene per geneGroup is 465.110721 statistic: 0.0104548931122 [New Thread 0x7ffff66eb700 (LWP 588062)] [New Thread 0x7ffff5eea700 (LWP 588063)] [New Thread 0x7ffff56e9700 (LWP 588064)] [New Thread 0x7ffff4ee8700 (LWP 588065)] [New Thread 0x7ffff46e7700 (LWP 588066)] [New Thread 0x7ffff3ee6700 (LWP 588067)] [New Thread 0x7ffff36e5700 (LWP 588068)] novel: 7860.30525613 The splicing graph and candidate read have been saved into ./temp/2021-06-03-21:27:47_306642.rmats save: 19.7066500187 WARNING: there are redundant temporary files. loadsg: 5.16925311089
ase: 2.03069710732
Thread 5 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4ee8700 (LWP 588065)]
0x00007ffff6d983a7 in std::detail::_Map_base<std::cxx11::basic_string<char, std::char_traits
I hope this may help solve the problem
The gdb output is long but there's not much information in it. It essentially says that the segfault occured when adding a new element to a map:
map<string,map<vector<pair<long, long>>,int>>>::operator[](string)
At the gdb prompt, after the segfault happens, you might be able to run bt
(back trace) to get more details. Also, if you can build rmats from source then you can get better gdb output by changing the compile flags. Replacing '-O3'
with '-O0', '-ggdb'
should let gdb show more details: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_pipeline/setup.py#L16
Since your output includes ase: {...}
rmats is up to counting the reads toward the detected splice junctions. My guess is that the error occurred at this line where the junction reads for a bam file are being read from a file: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3579
I think that the only reason map::operator[]
would segfault is if it could not allocate memory to add the new element. If you are able to run rmats again while watching the memory usage with htop
, then I expect you will see that the memory limit is reached
I'm getting an error using test data only when specifying threads. If I set the threads=1 then I no longer have the error.
Has anyone figured out this issue?
I ran gdb as suggested.
Here is the error that I get when nthread is greater than 1.
$ gdb python3
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-conda-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(gdb) r /local/cluster/rmats-4.1.2/bin/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
Starting program: /local/cluster/rmats-4.1.2/bin/python3 /local/cluster/rmats-4.1.2/bin/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
gtf: 0.06883001327514648
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
statistic: 0.00037789344787597656
[New Thread 0x7f4d0282e700 (LWP 30825)]
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
std::_Hash_bytes (ptr=<optimized out>, len=139968847681464, seed=<optimized out>)
at ../../../../libstdc++-v3/libsupc++/hash_bytes.cc:147
147 ../../../../libstdc++-v3/libsupc++/hash_bytes.cc: No such file or directory.
Compiled using devtoolset-7 in centos and got this error instead:
gtf: 0.07103800773620605
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
statistic: 0.000423431396484375
[New Thread 0x7fffee668700 (LWP 45920)]
Program received signal SIGSEGV, Segmentation fault.
operator() (this=this@entry=0x7fffffff9b40, __s=...)
at /local/cluster/centos/devtoolset-7/root/usr/include/c++/7/bits/basic_string.h:6401
6401 { return std::_Hash_impl::hash(__s.data(), __s.length()); }
The gdb output shows the segfault in std::_Hash_bytes (ptr=<optimized out>, len=139968847681464, seed=<optimized out>)
and then std::_Hash_impl::hash(__s.data(), __s.length())
. My guess is that somehow a string is null and then getting hashed during a map lookup, but I don't know where in the code that could happen
It looks like you were able to compile rmats yourself. Can you try compiling after changing these flags in https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_pipeline/setup.py#L16
Use: extra_compile_args = ['-O0', '-ggdb', '-funroll-loops',
instead of: extra_compile_args = ['-O3', '-funroll-loops',
Then when you run rmats under gdb and get the segfault hopefully you can get more detailed output
I've attached my gdb output from a build with those debug flags and also an error intentionally introduced in the code. Here are the gdb commands I ran after the segfault to get a backtrace for all threads (shown in the attached file):
(gdb) i threads
: to show all threads
(gdb) t 1
: to switch to thread 1
(gdb) bt
: to get the backtrace
Then repeated for t 2
and t 3
gdb_output.txt
Also what commit are you building from?
I built from the latest release tar.gz.
I had to add -lrt -lquadmath
to the LDFLAGS in the rMATS_C/Makefile to get it to work.
I'm on CentOS 7 if that matters.
I'll add those compiler flags and get back to you.
This error is from a version I compiled in a conda environment with GCC version 7.5.0. I see there is some reference to a conda file path /home/conda/feedstock_root...
that isn't present.
The system GCC of CentOS 7 is only 4.8.5 so I have to do some sort of workaround. I can load an CentOS environment (devtoolset-7) as I mentioned to use GCC 7. The version I compiled under the devtoolset-7 is the one that gave the basic_string.h error.
I wonder if this is a library linkage issue rather than an issue with the code. It's somewhat odd to me that it only shows up under a threaded environment though.
Ok, this seems like it might be an issue with Cython not finding the conda std and pthread libs. Looking into it more now. It seems to be pulling from the system std and phtread libs and they are not compatible... maybe.
Thanks for your help with the compiler flags.
I added the conda lib
dir to LD_LIBRARY_PATH
prior to building with cython. The error message slightly changed. I wonder if something still isn't linking properly.
In gdb_log_1.txt
the segfault happens with std::_Hash_impl::hash (__ptr=0x0, ...
which looks like the hash is called with a null pointer. The stack trace eventually goes up to __pyx_f_5rmats_13rmatspipeline_parse_bam ... at rmatspipeline/rmatspipeline.cpp:9056
Can you check that line in rMATS_pipeline/rmatspipeline/rmatspipeline.cpp
? In the local build I did that line in the .cpp has a comment pointing to https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L834
It might be that the rmats code at that line is not thread safe, but it only shows up for certain compilers/libraries?
Here is the section of that file with the comments above and below it.
9049 /* "rmatspipeline/rmatspipeline.pyx":834
9050 * while cg != geneGroup[i].end():
9051 * ## for each candidate gene
9052 * if ((supple[deref(cg)].chrom != bref_name # <<<<<<<<<<<<<<
9053 * or (supple[deref(cg)].strand != strand
9054 * and dt != FRUNSTRANDED)
9055 */
9056 __pyx_t_9 = (((__pyx_v_supple[(*__pyx_v_cg)]).chrom != __pyx_v_bref_name) != 0);
9057 if (!__pyx_t_9) {
9058 } else {
9059 __pyx_t_1 = __pyx_t_9;
9060 goto __pyx_L29_bool_binop_done;
9061 }
9062
9063 /* "rmatspipeline/rmatspipeline.pyx":835
9064 * ## for each candidate gene
9065 * if ((supple[deref(cg)].chrom != bref_name
9066 * or (supple[deref(cg)].strand != strand # <<<<<<<<<<<<<<
9067 * and dt != FRUNSTRANDED)
9068 * or visited.find(deref(cg)) != visited.end())):
9069 */
Can you try building with the code in this commit: https://github.com/Xinglab/rmats-turbo/commit/864b72d5debf391aaed01a7a0aae068281a34489
It seems like geneGroup
may be getting modified while being used by multiple threads. The change should stop geneGroup from getting unnecessarily modified
(gdb) r /local/cluster/rmats-4.1.2/rmats-turbo-864b72d/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
Starting program: /local/cluster/rmats-4.1.2/bin/python3 /local/cluster/rmats-4.1.2/rmats-turbo-864b72d/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
gtf: 0.0843348503112793
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
statistic: 0.0005159378051757812
[New Thread 0x7f784b3a6700 (LWP 20131)]
read outcome totals across all BAMs
USED: 3621
NOT_PAIRED: 83367
NOT_NH_1: 20438
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 88143
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 15598
CLIPPED: 0
total: 211167
outcomes by BAM written to: /data/davised/bam_test/2022-09-06-14_18_51_977788_read_outcomes_by_bam.txt
novel: 0.45911240577697754
The splicing graph and candidate read have been saved into /data/davised/bam_test/2022-09-06-14_18_51_977788_*.rmats
save: 0.004267215728759766
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_49_03_121583_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_31_59_503379_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-30-15_14_56_182521_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-17-15_38_53_078401_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-17-16_12_14_096702_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_27_16_794782_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-17-16_15_46_613812_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_49_27_586852_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-12_01_38_111049_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-30-12_29_32_652946_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-18-02_11_05_263325_0.rmats
testData/231ESRP.25K.rep-1.bam found 7 times in .rmats files
testData/231ESRP.25K.rep-2.bam found 7 times in .rmats files
testData/231EV.25K.rep-1.bam found 7 times in .rmats files
testData/231EV.25K.rep-2.bam found 7 times in .rmats files
[Thread 0x7f784b3a6700 (LWP 20131) exited]
[Inferior 1 (process 20130) exited with code 01]
Wow! You fixed it. Thanks for working through that with me. Our users will be happy to hear things are working.
If I run into more issues I'll be back.
Here is my step-by-step guide for getting this built on a system that has a version 4 gcc:
#!/usr/bin/env bash
mamba -c conda-forge -c bioconda create \
-n rmats-4.1.2 \
python=3.6.12 \
cmake=3.15.4 \
fortran-compiler \
c-compiler \
libblas \
libcblas \
liblapack \
cython=0.29.21 \
numpy=1.16.6 \
r-base=3.6.3 \
gcc_impl_linux-64=7.5.0 \
libgcc-ng=7.5.0 \
libstdcxx-ng=7.5.0 \
libgomp=7.5.0 \
gdb
touch etc/conda/activate.d/LD_env_vars.sh
touch etc/conda/deactivate.d/LD_env_vars.sh
Here are my example contents; you'll need to edit with appropriate paths:
$ cat etc/conda/activate.d/LD_env_vars.sh
#!/bin/sh
export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/local/cluster/rmats-4.1.2/lib:${LD_LIBRARY_PATH}
$ cat etc/conda/deactivate.d/LD_env_vars.sh
#!/bin/sh
export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}
unset OLD_LD_LIBRARY_PATH
git clone https://github.com/Xinglab/rmats-turbo.git
git checkout 864b72d
-lrt -lquadmath
to the LDFLAGS line in rMATS_C/Makefile
conda activate rmats-4.1.2
#!/usr/bin/env bash
if [[ ! -d PAIRADISE ]]; then
git clone https://github.com/Xinglab/PAIRADISE.git || return 1
fi
Rscript install_r_deps.R || return 1
GSL_LDFLAGS="$(gsl-config --libs)" || return 1
GSL_CFLAGS="$(gsl-config --cflags)" || return 1
export GSL_LDFLAGS || return 1
export GSL_CFLAGS || return 1
make || return 1
chmod +x rmats.py
cd ../bin
ln -s ../rmats-turbo/rmats.py .
Then when you activate the conda env, you just need to run rmats.py and you will be good to go.
To simplify the process, I'd suggest collecting the r packages and install them with conda as well. The R install script takes some time to run, and I think it'd be faster to get all of the R dependencies installed in the conda env.
Just wanted to report back, quote from one of our users:
Thanks again for all the help with this and the mouse genome. The amount of high quality, new data we’re generating with rMATs is amazing!
I'm glad to have spent the time debugging with you. Thanks for providing that level of support for this software.
Cheers!
I tried using gdb to debug the rmatspipeline package, but what ultimately resolved my issue was changing the BAM files. I recommend using samtools stats to inspect the BAM files before importing them. As for the "core dumped" error despite having 1.4Ti of free memory, it seems rmats isn't utilizing it properly.
Hi, I used rmats-turbo with threads more than 1 to my research, it came out to be crashed and threw out a Segmentation fault, but when i set the threads number to be 1, it ran smoothly, how did it happened, can you figure it out? Btw, it also come to the old released version, like 4.0.2. The error information listed as below: