CrazyHsu commented 4 years ago

Hi, I used rmats-turbo with threads more than 1 to my research, it came out to be crashed and threw out a Segmentation fault, but when i set the threads number to be 1, it ran smoothly, how did it happened, can you figure it out? Btw, it also come to the old released version, like 4.0.2. The error information listed as below:

$ rmats.py --gtf final.gtf --b1 B366_45.txt --b2 D366_45.txt -t paired --libType fr-unstranded --readLength 150 \
 --nthread 2 --od B366-45-1_B366-45-2_B366-45-3_vs_D366-45-1_D366-45-2_D366-45-3_1 \
 --novelSS --mil 20 --mel 1000 --tmp 366_45_tmp --paired-stats

gtf: 2.76681184769
There are 109280 distinct gene ID in the gtf file
There are 142631 distinct transcript ID in the gtf file
There are 88364 one-transcript genes in the gtf file
There are 907449 exons in the gtf file
There are 17267 one-exon transcripts in the gtf file
There are 16423 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 1.305189
Average number of exons per transcript is 6.362214
Average number of exons per transcript excluding one-exon tx is 7.100779
Average number of gene per geneGroup is 1.378398
statistic: 0.0358090400696
Segmentation fault (core dumped)

EricKutschera commented 4 years ago

In the output you provided, the last print is for "statistic:" and it doesn't reach the print for "novel:" so it seems that the segfault is happening in the detect_novel function: https://github.com/Xinglab/rmats-turbo/blob/master/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3070

I don't know specifically what is causing the segfault. Are you able to examine the core file with gdb or reproduce the error while running under gdb?

CrazyHsu commented 4 years ago

Yes, I will try gdb you mentioned above, and I also found that some one have came across the same problem, they have discussed in https://groups.google.com/forum/#!topic/rmats-user-group/aewShbUjdcI, but it seemed like they didn't have a clear conclusion. Besides, it is necessary to say that my research species is wheat whose genome size is very large, which may be the causal to the segfault!

CrazyHsu commented 4 years ago

Hi @EricKutschera , Sorry i am unfamiliar with gcc, although i have read the usage of gdb, i still can't know how to use it on rmats and reproduce the error. When i look for the causal of the problem on the internet, i found that many people think it may be the program have accessed the memory location which is not allowed to be accessed. I think it may be the genome size of wheat is too large which leads to the segment fault in Cython.

ywang1993 commented 4 years ago

I have encountered the segmentation fault (core dumped) problem before, but in a later step (post step). And the reason is that the program exceed the maximum memory. Based on your output, it seems that this segmentation fault occurred during bam processing. Other than requiring larger memory, another possible solution I can think of is to run 'prep' step for each BAM files separately, if you have multiple BAMs in b1/b2.

kmcoyle commented 4 years ago

I am having a similar problem: running on more than 1 thread results in a segmentation fault. The stdout after I start rmats-turbo is below. Running on 1 thread is not ideal, so I'm hopeful there is a solution.

gtf: 31.38361096382141
There are 60498 distinct gene ID in the gtf file
There are 198619 distinct transcript ID in the gtf file
There are 38759 one-transcript genes in the gtf file
There are 1173255 exons in the gtf file
There are 27204 one-exon transcripts in the gtf file
There are 25061 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 3.283067
Average number of exons per transcript is 5.907063
Average number of exons per transcript excluding one-exon tx is 6.685827
Average number of gene per geneGroup is 7.613305
statistic: 0.05198049545288086
novel: 2472.4337520599365
The splicing graph and candidate read have been saved into /tmp/2020-09-09-13:06:48_212692.rmats
save: 18.24647068977356

loadsg: 2.350325107574463

==========
Done processing each gene from dictionary to compile AS events
Found 93902 exon skipping events
Found 13344 exon MX events
Found 16344 alt SS events
There are 9905 alt 3 SS events and 6439 alt 5 SS events.
Found 6073 RI events
==========

ase: 4.146276473999023
Segmentation fault (core dumped)

benstory commented 4 years ago

I'm also getting consistent segfaults with the newest rmats-turbo version uniquely when including the --novelSS parameter.

ShenTTT commented 3 years ago

command I used: python rmats.py --b1 '/media/nus/AMLab_Ext1/Shen/splicing/try/Wr60_DS.txt' --b2 '/media/nus/AMLab_Ext1/Shen/splicing/try/Wr60_WS.txt' --gtf '/media/nus/AMLab_Ext1/Shen/splicing/try/converted.gtf' -t single --readLength 150 --variable-read-length --nthread 12 --novelSS --od '/media/nus/AMLab_Ext1/Shen/splicing/try/output' --tmp '/media/nus/AMLab_Ext1/Shen/splicing/try/temp_output'

I got this in a later step:

==========
Done processing each gene from dictionary to compile AS events
Found 22096 exon skipping events
Found 8794 exon MX events
Found 45794 alt SS events
There are 23930 alt 3 SS events and 21864 alt 5 SS events.
Found 3045 RI events
==========

ase: 4.793232679367065
Segmentation fault (core dumped)

EricKutschera commented 3 years ago

It could be that the segfault occurs because there is not enough memory available on the machine. You could try watching the memory usage (maybe with htop) while rmats is running and you might be able to see the memory increasing

You could also try running under gdb so that you can get a stack trace when the segfault occurs. You could try to use gdb as described in this comment: https://github.com/Xinglab/rmats-turbo/issues/62#issuecomment-737235215

If insufficient memory turns out to be the issue, then you can try running rmats in a way that uses memory more efficiently. rmats includes a prep and a post step. In the prep step each thread processes 1 BAM file at a time. The prep step creates 1 or more .rmats files. In the post step each thread processes 1 .rmats file at a time. If you use rmats code after this change https://github.com/Xinglab/rmats-turbo/pull/53 then each BAM file will have its own .rmats file. Before that change there was 1 .rmats file per prep step. By default rmats runs with --task both which runs a single prep step immediately followed by a post step. You can make things more efficient by following https://github.com/Xinglab/rmats-turbo#running-prep-and-post-separately but running 1 prep step per BAM file so that there is 1 .rmats file per BAM even without https://github.com/Xinglab/rmats-turbo/pull/53

If there are multiple files that can be processed concurrently by multiple threads, then more threads will use more memory. You can choose a smaller value for --nthread to reduce the memory usage

The --novelSS option allows rmats to check for additional events, but it may also increase the memory usage

benstory commented 3 years ago

I will try to go back through my analysis in a few weeks and troubleshoot with gdb. Also, just to note, I originally ran this on a machine with almost 2TB of RAM.

bogv127515 commented 3 years ago

I also had similar problems in the same steps，this is my gdb debugging return

Starting program: /home/xxx/anaconda3/envs/rmats/bin/python ./rmats.py --b1 ./control.txt --b2 ./sh.txt --nthread 8 --variable-read-length -t paired --gtf /home/xxx/data/raw_ref/hg38_ucsc.annotated.gtf --od ./splicing_out/ --tmp ./temp [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". gtf: 24.5085580349 There are 51276 distinct gene ID in the gtf file There are 52638 distinct transcript ID in the gtf file There are 50570 one-transcript genes in the gtf file There are 542154 exons in the gtf file There are 4654 one-exon transcripts in the gtf file There are 3903 one-transcript genes with only one exon in the transcript Average number of transcripts per gene is 1.026562 Average number of exons per transcript is 10.299669 Average number of exons per transcript excluding one-exon tx is 11.201651 Average number of gene per geneGroup is 465.110721 statistic: 0.0104548931122 [New Thread 0x7ffff66eb700 (LWP 588062)] [New Thread 0x7ffff5eea700 (LWP 588063)] [New Thread 0x7ffff56e9700 (LWP 588064)] [New Thread 0x7ffff4ee8700 (LWP 588065)] [New Thread 0x7ffff46e7700 (LWP 588066)] [New Thread 0x7ffff3ee6700 (LWP 588067)] [New Thread 0x7ffff36e5700 (LWP 588068)] novel: 7860.30525613 The splicing graph and candidate read have been saved into ./temp/2021-06-03-21:27:47_306642.rmats save: 19.7066500187 WARNING: there are redundant temporary files. loadsg: 5.16925311089

========== Done processing each gene from dictionary to compile AS events Found 61033 exon skipping events Found 13805 exon MX events Found 0 alt SS events There are 0 alt 3 SS events and 0 alt 5 SS events. Found 20 RI events

ase: 2.03069710732

Thread 5 "python" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff4ee8700 (LWP 588065)] 0x00007ffff6d983a7 in std::detail::_Map_base<std::cxx11::basic_string<char, std::char_traits, std::allocator >, std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::map<std::vector<std::pair<long, long>, std::allocator<std::pair<long, long> > >, int, std::less<std::vector<std::pair<long, long>, std::allocator<std::pair<long, long> > > >, std::allocator<std::pair<std::vector<std::pair<long, long>, std::allocator<std::pair<long, long> > > const, int> > > >, std::allocator<std::pair<std::cxx11::basic_string<char, std::char_traits, std::allocator > const, std::map<std::vector<std::pair<long, long>, std::allocator<std::pair<long, long> > >, int, std::less<std::vector<std::pair<long, long>, std::allocator<std::pair<long, long> > > >, std::allocator<std::pair<std::vector<std::pair<long, long>, std::allocator<std::pair<long, long> > > const, int> > > > >, std::detail::_Select1st, std::equal_to<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::hash<std::cxx11::basic_string<char, std::char_traits, std::allocator > >, std::detail::_Mod_range_hashing, std::detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::detail::_Hashtable_traits<true, false, true>, true>::operator[](std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) () from /home/xxx/data/soft/rmats-turbo/rmatspipeline.so

I hope this may help solve the problem

EricKutschera commented 3 years ago

The gdb output is long but there's not much information in it. It essentially says that the segfault occured when adding a new element to a map: map<string,map<vector<pair<long, long>>,int>>>::operator[](string)

At the gdb prompt, after the segfault happens, you might be able to run bt (back trace) to get more details. Also, if you can build rmats from source then you can get better gdb output by changing the compile flags. Replacing '-O3' with '-O0', '-ggdb' should let gdb show more details: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_pipeline/setup.py#L16

Since your output includes ase: {...} rmats is up to counting the reads toward the detected splice junctions. My guess is that the error occurred at this line where the junction reads for a bam file are being read from a file: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L3579

I think that the only reason map::operator[] would segfault is if it could not allocate memory to add the new element. If you are able to run rmats again while watching the memory usage with htop, then I expect you will see that the memory limit is reached

davised commented 2 years ago

I'm getting an error using test data only when specifying threads. If I set the threads=1 then I no longer have the error.

Has anyone figured out this issue?

davised commented 2 years ago

I ran gdb as suggested.

Here is the error that I get when nthread is greater than 1.

$ gdb python3
GNU gdb (GDB) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-conda-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...
(gdb) r /local/cluster/rmats-4.1.2/bin/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
Starting program: /local/cluster/rmats-4.1.2/bin/python3 /local/cluster/rmats-4.1.2/bin/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
gtf: 0.06883001327514648
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
statistic: 0.00037789344787597656
[New Thread 0x7f4d0282e700 (LWP 30825)]

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
std::_Hash_bytes (ptr=<optimized out>, len=139968847681464, seed=<optimized out>)
    at ../../../../libstdc++-v3/libsupc++/hash_bytes.cc:147
147 ../../../../libstdc++-v3/libsupc++/hash_bytes.cc: No such file or directory.

davised commented 2 years ago

Compiled using devtoolset-7 in centos and got this error instead:

gtf: 0.07103800773620605
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
statistic: 0.000423431396484375
[New Thread 0x7fffee668700 (LWP 45920)]

Program received signal SIGSEGV, Segmentation fault.
operator() (this=this@entry=0x7fffffff9b40, __s=...)
    at /local/cluster/centos/devtoolset-7/root/usr/include/c++/7/bits/basic_string.h:6401
6401         { return std::_Hash_impl::hash(__s.data(), __s.length()); }

EricKutschera commented 2 years ago

The gdb output shows the segfault in std::_Hash_bytes (ptr=<optimized out>, len=139968847681464, seed=<optimized out>) and then std::_Hash_impl::hash(__s.data(), __s.length()). My guess is that somehow a string is null and then getting hashed during a map lookup, but I don't know where in the code that could happen

It looks like you were able to compile rmats yourself. Can you try compiling after changing these flags in https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_pipeline/setup.py#L16 Use: extra_compile_args = ['-O0', '-ggdb', '-funroll-loops', instead of: extra_compile_args = ['-O3', '-funroll-loops',

Then when you run rmats under gdb and get the segfault hopefully you can get more detailed output

I've attached my gdb output from a build with those debug flags and also an error intentionally introduced in the code. Here are the gdb commands I ran after the segfault to get a backtrace for all threads (shown in the attached file): (gdb) i threads: to show all threads (gdb) t 1: to switch to thread 1 (gdb) bt: to get the backtrace Then repeated for t 2 and t 3 gdb_output.txt

Also what commit are you building from?

davised commented 2 years ago

I built from the latest release tar.gz.

I had to add -lrt -lquadmath to the LDFLAGS in the rMATS_C/Makefile to get it to work.

I'm on CentOS 7 if that matters.

I'll add those compiler flags and get back to you.

davised commented 2 years ago

gdb_log.txt

davised commented 2 years ago

This error is from a version I compiled in a conda environment with GCC version 7.5.0. I see there is some reference to a conda file path /home/conda/feedstock_root... that isn't present.

The system GCC of CentOS 7 is only 4.8.5 so I have to do some sort of workaround. I can load an CentOS environment (devtoolset-7) as I mentioned to use GCC 7. The version I compiled under the devtoolset-7 is the one that gave the basic_string.h error.

I wonder if this is a library linkage issue rather than an issue with the code. It's somewhat odd to me that it only shows up under a threaded environment though.

davised commented 2 years ago

Ok, this seems like it might be an issue with Cython not finding the conda std and pthread libs. Looking into it more now. It seems to be pulling from the system std and phtread libs and they are not compatible... maybe.

Thanks for your help with the compiler flags.

davised commented 2 years ago

I added the conda lib dir to LD_LIBRARY_PATH prior to building with cython. The error message slightly changed. I wonder if something still isn't linking properly.

gdb_log_1.txt

EricKutschera commented 2 years ago

In gdb_log_1.txt the segfault happens with std::_Hash_impl::hash (__ptr=0x0, ... which looks like the hash is called with a null pointer. The stack trace eventually goes up to __pyx_f_5rmats_13rmatspipeline_parse_bam ... at rmatspipeline/rmatspipeline.cpp:9056

Can you check that line in rMATS_pipeline/rmatspipeline/rmatspipeline.cpp? In the local build I did that line in the .cpp has a comment pointing to https://github.com/Xinglab/rmats-turbo/blob/v4.1.2/rMATS_pipeline/rmatspipeline/rmatspipeline.pyx#L834

It might be that the rmats code at that line is not thread safe, but it only shows up for certain compilers/libraries?

davised commented 2 years ago

Here is the section of that file with the comments above and below it.

 9049             /* "rmatspipeline/rmatspipeline.pyx":834
 9050  *                 while cg != geneGroup[i].end():
 9051  *                     ## for each candidate gene
 9052  *                     if ((supple[deref(cg)].chrom != bref_name             # <<<<<<<<<<<<<<
 9053  *                          or (supple[deref(cg)].strand != strand
 9054  *                              and dt != FRUNSTRANDED)
 9055  */
 9056             __pyx_t_9 = (((__pyx_v_supple[(*__pyx_v_cg)]).chrom != __pyx_v_bref_name) != 0);
 9057             if (!__pyx_t_9) {
 9058             } else {
 9059               __pyx_t_1 = __pyx_t_9;
 9060               goto __pyx_L29_bool_binop_done;
 9061             }
 9062
 9063             /* "rmatspipeline/rmatspipeline.pyx":835
 9064  *                     ## for each candidate gene
 9065  *                     if ((supple[deref(cg)].chrom != bref_name
 9066  *                          or (supple[deref(cg)].strand != strand             # <<<<<<<<<<<<<<
 9067  *                              and dt != FRUNSTRANDED)
 9068  *                          or visited.find(deref(cg)) != visited.end())):
 9069  */

EricKutschera commented 2 years ago

Can you try building with the code in this commit: https://github.com/Xinglab/rmats-turbo/commit/864b72d5debf391aaed01a7a0aae068281a34489

It seems like geneGroup may be getting modified while being used by multiple threads. The change should stop geneGroup from getting unnecessarily modified

davised commented 2 years ago

(gdb) r /local/cluster/rmats-4.1.2/rmats-turbo-864b72d/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
Starting program: /local/cluster/rmats-4.1.2/bin/python3 /local/cluster/rmats-4.1.2/rmats-turbo-864b72d/rmats.py --b1 bam1.txt --b2 bam2.txt --gtf testData/test.gtf --od bam_test --tmp /data/davised/bam_test -t paired --readLength 50 --nthread 2
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
gtf: 0.0843348503112793
There are 60 distinct gene ID in the gtf file
There are 869 distinct transcript ID in the gtf file
There are 0 one-transcript genes in the gtf file
There are 5885 exons in the gtf file
There are 2 one-exon transcripts in the gtf file
There are 0 one-transcript genes with only one exon in the transcript
Average number of transcripts per gene is 14.483333
Average number of exons per transcript is 6.772152
Average number of exons per transcript excluding one-exon tx is 6.785467
Average number of gene per geneGroup is 1.000000
statistic: 0.0005159378051757812
[New Thread 0x7f784b3a6700 (LWP 20131)]

read outcome totals across all BAMs
USED: 3621
NOT_PAIRED: 83367
NOT_NH_1: 20438
NOT_EXPECTED_CIGAR: 0
NOT_EXPECTED_READ_LENGTH: 0
NOT_EXPECTED_STRAND: 0
EXON_NOT_MATCHED_TO_ANNOTATION: 88143
JUNCTION_NOT_MATCHED_TO_ANNOTATION: 15598
CLIPPED: 0
total: 211167
outcomes by BAM written to: /data/davised/bam_test/2022-09-06-14_18_51_977788_read_outcomes_by_bam.txt

novel: 0.45911240577697754
The splicing graph and candidate read have been saved into /data/davised/bam_test/2022-09-06-14_18_51_977788_*.rmats
save: 0.004267215728759766
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_49_03_121583_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_31_59_503379_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-30-15_14_56_182521_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-17-15_38_53_078401_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-17-16_12_14_096702_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_27_16_794782_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-17-16_15_46_613812_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-13_49_27_586852_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-26-12_01_38_111049_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-30-12_29_32_652946_0.rmats
WARNING: A .rmats file was found with no bams listed in it. Ignoring that file: /data/davised/bam_test/2022-08-18-02_11_05_263325_0.rmats
testData/231ESRP.25K.rep-1.bam found 7 times in .rmats files
testData/231ESRP.25K.rep-2.bam found 7 times in .rmats files
testData/231EV.25K.rep-1.bam found 7 times in .rmats files
testData/231EV.25K.rep-2.bam found 7 times in .rmats files
[Thread 0x7f784b3a6700 (LWP 20131) exited]
[Inferior 1 (process 20130) exited with code 01]

davised commented 2 years ago

Wow! You fixed it. Thanks for working through that with me. Our users will be happy to hear things are working.

If I run into more issues I'll be back.

davised commented 2 years ago

Here is my step-by-step guide for getting this built on a system that has a version 4 gcc:

Generate a conda environment (I suggest using mamba because it's faster to resolve dependencies):

#!/usr/bin/env bash
mamba -c conda-forge -c bioconda create \
-n rmats-4.1.2 \
python=3.6.12 \
cmake=3.15.4 \
fortran-compiler \
c-compiler \
libblas \
libcblas \
liblapack \
cython=0.29.21 \
numpy=1.16.6 \
r-base=3.6.3 \
gcc_impl_linux-64=7.5.0 \
libgcc-ng=7.5.0 \
libstdcxx-ng=7.5.0 \
libgomp=7.5.0 \
gdb

Change directory to base of the built conda env

Generate LD_env_vars.sh files to add the lib of the conda env into $LD_LIBRARY_PATH when I activate the conda env

touch etc/conda/activate.d/LD_env_vars.sh
touch etc/conda/deactivate.d/LD_env_vars.sh

Here are my example contents; you'll need to edit with appropriate paths:

$ cat etc/conda/activate.d/LD_env_vars.sh
#!/bin/sh
export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
export LD_LIBRARY_PATH=/local/cluster/rmats-4.1.2/lib:${LD_LIBRARY_PATH}
$ cat etc/conda/deactivate.d/LD_env_vars.sh
#!/bin/sh
export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}
unset OLD_LD_LIBRARY_PATH

Clone the repo and checkout the appropriate commit

git clone https://github.com/Xinglab/rmats-turbo.git
git checkout 864b72d

Add -lrt -lquadmath to the LDFLAGS line in rMATS_C/Makefile
Activate the conda environment
```
conda activate rmats-4.1.2
```

Build using this script

#!/usr/bin/env bash
if [[ ! -d PAIRADISE ]]; then
git clone https://github.com/Xinglab/PAIRADISE.git || return 1
fi
Rscript install_r_deps.R || return 1
GSL_LDFLAGS="$(gsl-config --libs)" || return 1
GSL_CFLAGS="$(gsl-config --cflags)" || return 1
export GSL_LDFLAGS || return 1
export GSL_CFLAGS || return 1
make || return 1

Make the rmats.py executable and link it in the bin dir
```
chmod +x rmats.py
cd ../bin
ln -s ../rmats-turbo/rmats.py .
```
Then when you activate the conda env, you just need to run rmats.py and you will be good to go.

davised commented 2 years ago

To simplify the process, I'd suggest collecting the r packages and install them with conda as well. The R install script takes some time to run, and I think it'd be faster to get all of the R dependencies installed in the conda env.

davised commented 2 years ago

Just wanted to report back, quote from one of our users:

Thanks again for all the help with this and the mouse genome. The amount of high quality, new data we’re generating with rMATs is amazing!

I'm glad to have spent the time debugging with you. Thanks for providing that level of support for this software.

Cheers!

sandy9707 commented 6 months ago

I tried using gdb to debug the rmatspipeline package, but what ultimately resolved my issue was changing the BAM files. I recommend using samtools stats to inspect the BAM files before importing them. As for the "core dumped" error despite having 1.4Ti of free memory, it seems rmats isn't utilizing it properly.

Xinglab / rmats-turbo

Segmentation fault (core dumped) #18

========== Done processing each gene from dictionary to compile AS events Found 61033 exon skipping events Found 13805 exon MX events Found 0 alt SS events There are 0 alt 3 SS events and 0 alt 5 SS events. Found 20 RI events