Xinglab / rmats-turbo

Other
228 stars 55 forks source link

Missing file errors persist in tests using 4.1.1 #101

Open cb4github opened 3 years ago

cb4github commented 3 years ago

Dear All, First, thanks for all your efforts! Next, granted that our campus cluster using centos is in need of some updating and that some of the dependencies used in the following are downlevel from those listed in the README.md, nonetheless I was able to complete the build without the --conda option. (Please see module definition and resulting module list below.)

I've run the provided (31) tests with a modified version of the downloaded scripts test_rmats and run_rmats, which I've modified per the following.

  1. Avoid using a conda/virtual environment.
  2. Decouple our RMATS_TURBO_HOME from the SCRIPT_DIR used for testing to support item 3.
  3. Allow non-privileged users to copy and run the provided tests in their own directory.

From what I've observed among the open issues, the error that I'm getting appears to be the same as that reported first here https://groups.google.com/g/rmats-user-group/c/u-hw6Wzi1LA and then here #91 - though it was suggested in the latter that the error should be fixed in version 4.1.1.

Also, I get the same results - the same 6 failed tests out of 31 attempted tests - regardless of whether I place the tests directory in our RMATS_TURBO_HOME under /share or in my own test directory under /lustre.

Here is the resulting complete list of failed tests (non-empty rmats_stderr's) and their included FileNotFoundError's.

$ find tests -name rmats_stderr ! -size 0 -print -exec grep "FileNotFoundError" {} \;
tests/prep_post/command_output/rmats_stderr
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/prep_post/out/tmp/JC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/prep_post/out/tmp/JCEC_SE/rMATS_result_P-V.txt'
tests/variable_read_length/length_1_variable/command_output/rmats_stderr
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/variable_read_length/length_1_variable/out/tmp/JCEC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/variable_read_length/length_1_variable/out/tmp/JC_MXE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/variable_read_length/length_1_variable/out/tmp/JCEC_MXE/rMATS_result_P-V.txt'
tests/variable_read_length/length_2_variable/command_output/rmats_stderr
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/variable_read_length/length_2_variable/out/tmp/JCEC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/variable_read_length/length_2_variable/out/tmp/JC_MXE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/variable_read_length/length_2_variable/out/tmp/JCEC_MXE/rMATS_result_P-V.txt'
tests/variable_read_length/no_length/command_output/rmats_stderr
tests/allow_clipping/clipping_allowed/command_output/rmats_stderr
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/allow_clipping/clipping_allowed/out/tmp/JC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/allow_clipping/clipping_allowed/out/tmp/JCEC_SE/rMATS_result_P-V.txt'
tests/task_stat/command_output/rmats_stderr
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JCEC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JC_A3SS/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JCEC_A3SS/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JC_A5SS/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JCEC_A5SS/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/task_stat/out_select/tmp/JCEC_RI/rMATS_result_P-V.txt'
tests/skipped_exon_basic/command_output/rmats_stderr
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt'
FileNotFoundError: [Errno 2] No such file or directory: '/share/apps/rmats-turbo/4.1.1/tests/skipped_exon_basic/out/tmp/JCEC_SE/rMATS_result_P-V.txt'
tests/only_one_sample/stat_on/command_output/rmats_stderr

Here is my current module definition (and resulting module list further below) - including...

  1. python3 in our existing anaconda3 (though not a virtual environment per se) and
  2. our existing R/3.6.1-intel with a .Rprofile file to prepen RMATS_TURBO_HOME/R/Library to the .libPaths() list.
$ module show rmats-turbo/4.1.1
-------------------------------------------------------------------
/share/apps/moduleFiles/rmats-turbo/4.1.1:

module-whatis    rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net). The major difference between rMATS turbo and rMATS is speed and space usage. 
module       load anaconda3/5.1.0 R/3.6.1-intel cmake/3.12.1 samtools/1.10 gsl/2.6 
module       swap intel-psxe/2015-update1 intel-psxe/2019-update1 
prepend-path     PATH /share/apps/rmats-turbo/4.1.1 
prepend-path     LD_LIBRARY_PATH /share/apps/rmats-turbo/4.1.1/lib 
prepend-path     LIBRARY_PATH /share/apps/rmats-turbo/4.1.1/lib 
setenv       RMATS_TURBO_HOME /share/apps/rmats-turbo/4.1.1 
setenv       PYTHONPATH /share/apps/rmats-turbo/4.1.1 
setenv       R_PROFILE_USER /share/apps/rmats-turbo/4.1.1/.Rprofile 
-------------------------------------------------------------------

Here is the resulting module list.

$ module list
Currently Loaded Modulefiles:
  1) slurm/14.03.0             6) zlib/1.2.8               11) gcc/4.9.4
  2) idev                      7) bzip2/1.0.6              12) cmake/3.12.1
  3) bbcp/amd64_rhel60         8) xz/5.2.2                 13) samtools/1.10
  4) anaconda3/5.1.0           9) pcre/8.39                14) gsl/2.6
  5) intel-psxe/2019-update1  10) R/3.6.1-intel            15) rmats-turbo/4.1.1

Please advise, thanks, and please let me know if you'd prefer more information.

Best, CB

EricKutschera commented 3 years ago

Thanks for providing a detailed description of the error. I think the FileNotFoundError is from this line: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rmats.py#L416

Using the skipped_exon_basic test as an example, that line is trying to run a command similar to: python ./rMATS_P/FDR.py ./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt ./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_FDR.txt

The file that is not found should have been created by the command one line earlier which would be similar to: ./rMATS_C/rMATSexe -i ./tests/skipped_exon_basic/out/JC.raw.input.SE.txt -t 1 -o ./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt -c 0.0001

When the test runs that command it ignores the output. Could you try running that ./rMATS_C/rMATSexe command in your environment to see if an error is reported? It might be failing to load a shared library like libblas. If no error is reported, could you post the contents of tests/skipped_exon_basic/command_output/rmats_std{out,err} as well?

cb4github commented 3 years ago

Thanks so much for your quick response.

Here's my result for the suggested command - along with contents of the input file.

$ $RMATS_TURBO_HOME/rMATS_C/rMATSexe -i ./tests/skipped_exon_basic/out/JC.raw.input.SE.txt -t 1 -o ./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt -c 0.0001
number of thread=1; input file=./tests/skipped_exon_basic/out/JC.raw.input.SE.txt; output folder=./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt; cutoff=0.0001;
Testing 0
Segmentation fault
$ cat tests/skipped_exon_basic/out/JC.raw.input.SE.txt 
ID  IJC_SAMPLE_1    SJC_SAMPLE_1    IJC_SAMPLE_2    SJC_SAMPLE_2    IncFormLen  SkipFormLen
0   1,1 0,0 0,0 1,1 98  49

Please let me know if you need more info, thanks again!

cb4github commented 3 years ago

Update:

I've updated my build using Cython 0.29.22 vice 0.28.5. See informative post here https://github.com/Xinglab/rmats-turbo/issues/73#issuecomment-755739242

The result is much better.

rMATS_C/rMATSexe -i ./tests/skipped_exon_basic/out/JC.raw.input.SE.txt -t 1 -o ./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt -c 0.0001
number of thread=1; input file=./tests/skipped_exon_basic/out/JC.raw.input.SE.txt; output folder=./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt; cutoff=0.0001;
Fail to open!Total Wallclock time taken 0 seconds 0 milliseconds
Wallclock time per thread taken 0 seconds 0 milliseconds

I hope to rerun all tests shortly with the new build and report the new result. Thanks again! Best, CB

EricKutschera commented 3 years ago

The Cython version should not have caused rMATS_C/rMATSexe to give a segmentation fault since Cython is not used for that part of the code. The output after you updated the build includes "Fail to open!" which seems to be from this line: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_C/src/util.c#L238

It might be that the input file path was incorrect and that rMATSexe failed for that reason before it reached the part that would have caused a seg fault. If you get the seg fault again, you can update the build to use debug info by replacing -O2 with -O0 -ggdb here: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/rMATS_C/Makefile#L14 and then deleting rMATS_C/rMATSexe and rerunning the build

Then you could run the command in gdb to get a stack trace for the seg fault. First run gdb rMATS_C/rMATSexe to start gdb with rMATSexe

Then at the (gdb) prompt run r -i ./tests/skipped_exon_basic/out/JC.raw.input.SE.txt -t 1 -o ./tests/skipped_exon_basic/out/tmp/JC_SE/rMATS_result_P-V.txt -c 0.0001 to run with those arguments. If the program seg faults then at the (gdb) prompt you can do bt to get a back trace that should give more details about the error

cb4github commented 3 years ago

For now - and for various reasons I've admittedly lost track of - I've reverted back to retrying the build with --conda and gcc/6.3.0 - assuming that's ok. But then again - could gcc/6.3.0 somehow be too recent (too far ahead of the stated >=5.4.0)?

Briefly, I've tried a few things, but my build continues to fail attempting to build rMATSexe - with details of the latest attempt/failure in the following.

Here's my updated module list.

module list
Currently Loaded Modulefiles:
  1) slurm/14.03.0             7) bzip2/1.0.6              13) samtools/1.10
  2) idev                      8) xz/5.2.2                 14) gsl/2.6
  3) bbcp/amd64_rhel60         9) pcre/8.39                15) star/2.5.2a
  4) anaconda3/5.1.0          10) R/3.6.1-intel            16) gcc/6.3.0
  5) intel-psxe/2019-update1  11) gcc/4.9.4                17) rmats-turbo/4.1.1
  6) zlib/1.2.8               12) cmake/3.12.1

Also, here's my updated module definition.

module show rmats-turbo/4.1.1
-------------------------------------------------------------------
/share/apps/modulefiles/rmats-turbo/4.1.1:

module-whatis    rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net). The major difference between rMATS turbo and rMATS is speed and space usage. 
module       load anaconda3/5.1.0 R/3.6.1-intel cmake/3.12.1 samtools/1.10 gsl/2.6 star/2.5.2a gcc/6.3.0 
module       swap intel-psxe/2015-update1 intel-psxe/2019-update1 
prepend-path     PATH /share/apps/rmats-turbo/4.1.1 
prepend-path     LD_LIBRARY_PATH /share/apps/rmats-turbo/4.1.1/lib 
prepend-path     LIBRARY_PATH /share/apps/rmats-turbo/4.1.1/lib 
prepend-path     PYTHONUSERBASE /share/apps/rmats-turbo/4.1.1 
setenv       RMATS_TURBO_HOME /share/apps/rmats-turbo/4.1.1 
setenv       CONDA_ENVS_PATH /share/apps/rmats-turbo/4.1.1/conda_envs 
setenv       PYTHONPATH /share/apps/rmats-turbo/4.1.1 
setenv       R_PROFILE_USER /share/apps/rmats-turbo/4.1.1/.Rprofile 

Thus I've created the rmats conda virtual environment - as well as modifying line 13 of rMATS_C/Makefile to include the ...rmats/lib as follows. LDFLAGS := -lm -lgfortran -lgsl -lgslcblas -lgomp -L$(CONDA_ENVS_PATH)/rmats/lib -lblas -llapack However, the build is failing, and my latest error is the following.

head 1
rm: cannot remove `*.dylib': No such file or directory
make: [build] Error 1 (ignored)
lbfgs_scipy/lbfgsb.o: In function `setulb_':
lbfgsb.f:(.text+0x8b): undefined reference to `s_cmp'
lbfgs_scipy/lbfgsb.o: In function `mainlb_':
lbfgsb.f:(.text+0x54c): undefined reference to `s_cmp'
lbfgsb.f:(.text+0x700): undefined reference to `s_copy'
lbfgsb.f:(.text+0x730): undefined reference to `f_open'
lbfgsb.f:(.text+0x7a1): undefined reference to `s_cmp'
lbfgsb.f:(.text+0xa8e): undefined reference to `s_cmp'

Also, here's the list of blas-related dependencies in the rmats/lib.

source activate rmats
(rmats) $ conda list | grep blas
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge

Shouldn't the file rmats/lib/libblas.so (or other?) dependency provided by conda be supplying the undefined references mentioned above? Best, CB

cb4github commented 3 years ago

Update: I've overcome the previous and succeeded in building rMATSexe by adding -L/our/path/to/libg2c.so and -lg2c to LDFLAGS in rMATS_C/Makefile. I hope to report updated test results shortly after revisiting the solution(s) for "No module named 'rmatspipeline'". Thanks.

cb4github commented 3 years ago

My build and tests - no longer using conda - succeeded using the following.

Here's the build command. CC=$(which gcc) CXX=$(which g++) CXX11=$(which g++) ./build_rmats

Here's the modified version of line 13 of the file bamtools/CMakeLists.txt, including -L pointing to the libg2c.so (dir) to avoid "undefined...s_cmp", etc. errors. LDFLAGS := -lm -lgfortran -lgsl -lgslcblas -lgomp -L/cm/shared/apps/openblas/0.2.8/lib -lopenblas -L/cm/shared/apps/lapack/open64/64/3.5.0 -llapack -L$(RMATS_TURBO_HOME) -lg2c

Here's the module definition.

$ module show rmats-turbo/4.1.1
-------------------------------------------------------------------
/share/apps/modulefiles/rmats-turbo/4.1.1:

module-whatis    rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net). The major difference between rMATS turbo and rMATS is speed and space usage. 
module       load anaconda3/5.1.0 R/3.6.1-intel cmake/3.12.1 samtools/1.10 gsl/2.6 star/2.5.2a openblas/dynamic/0.2.8 lapack/open64/64/3.5.0 
prepend-path     PATH /share/apps/rmats-turbo/4.1.1 
prepend-path     LD_LIBRARY_PATH /share/apps/rmats-turbo/4.1.1/lib 
prepend-path     LIBRARY_PATH /share/apps/rmats-turbo/4.1.1/lib 
prepend-path     PYTHONUSERBASE /share/apps/rmats-turbo/4.1.1 
setenv       RMATS_TURBO_HOME /share/apps/rmats-turbo/4.1.1 
setenv       PYTHONPATH /share/apps/rmats-turbo/4.1.1 
setenv       R_PROFILE_USER /share/apps/rmats-turbo/4.1.1/.Rprofile 
-------------------------------------------------------------------

Here's the resulting module list after loading that module.

$ module list
Currently Loaded Modulefiles:
  1) slurm/14.03.0                     10) R/3.6.1-intel
  2) idev                              11) gcc/4.9.4
  3) bbcp/amd64_rhel60                 12) cmake/3.12.1
  4) anaconda3/5.1.0                   13) samtools/1.10
  5) intel-psxe/2015-update1(default)  14) gsl/2.6
  6) zlib/1.2.8                        15) star/2.5.2a
  7) bzip2/1.0.6                       16) openblas/dynamic/0.2.8
  8) xz/5.2.2                          17) lapack/open64/64/3.5.0
  9) pcre/8.39                         18) rmats-turbo/4.1.1

Also, all 31 tests (finally :) ran successfully without conda (though still fails with conda due to missing module rmatspipeline and I'm not sure why) using a modified test_rmats (as I'd indicated previously) per the following diff.

$ diff /released/test_rmats /share/apps/rmats-turbo/4.1.1/test_rmats_cluster
4c4
<   local REL_SCRIPT_DIR="$(dirname ${BASH_SOURCE[0]})" || return 1
---
>   local REL_SCRIPT_DIR="${PWD}" || return 1
12,14c12,14
<   local RUN_RMATS="${SCRIPT_DIR}/run_rmats"
<   local CP_WITH_PREFIX="${SCRIPT_DIR}/cp_with_prefix.py"
<   local PREPARE_STAT_INPUTS="${SCRIPT_DIR}/rMATS_P/prepare_stat_inputs.py"
---
>   local RUN_RMATS="${RMATS_TURBO_HOME}/run_rmats"
>   local CP_WITH_PREFIX="${RMATS_TURBO_HOME}/cp_with_prefix.py"
>   local PREPARE_STAT_INPUTS="${RMATS_TURBO_HOME}/rMATS_P/prepare_stat_inputs.py"
26,27c26,29
<   conda create --prefix "${CONDA_ENV_PATH}" || return 1
<   conda activate "${CONDA_ENV_PATH}" || return 1
---
>   conda create --prefix "${CONDA_ENV_PATH}" --yes
>   source activate "${CONDA_ENV_PATH}" || return 1
>   which python
>   python --version
31c33,34
<   conda install -c conda-forge -c bioconda python=3.8 samtools=1.9 || return 1
---
>   conda list |grep samtools |grep 1.9 && return 0
>   conda install -c conda-forge -c bioconda python=3.8 samtools=1.9 --yes || return 1
39,40c42,46
<   create_and_activate_conda_env || return 1
<   install_dependencies || return 1
---
>   local CONDA_ENV_PATH="${SCRIPT_DIR}/conda_envs"
>   if [ -d $CONDA_ENV_PATH ]; then
>     create_and_activate_conda_env || return 1
>       install_dependencies || return 1
>   fi
42a49,50
>   echo "sys.path in test_rmats_cluster"
>   python -c 'import sys; print("\n".join(sys.path))' || return 1
52c60,62
<   conda deactivate || return 1
---
>   if [ -d $CONDA_ENV_PATH ]; then
>       source deactivate || return 1
>   fi
54a65,66
> which python
> python --version

Comments welcome.

Also, I'd appreciate any information that you can provide (preferably updated from #12) that would assist in determining which artifacts I can remove post-build that aren't needed for either runtime or user reference. Here's what the module directory looks like at this typing.

$ ls -lt /share/apps/rmats-turbo/4.1.1
total 7484
-rw-r--r--  1 cbaribault hpcstaff    3407 Mar 25 01:16 README_cluster
drwxr-xr-x  4 cbaribault hpcstaff    4096 Mar 25 00:11 rMATS_pipeline
-rwxr-xr-x  1 cbaribault hpcstaff    2031 Mar 24 22:52 test_rmats_cluster
-rwxr-xr-x  1 cbaribault hpcstaff 7528456 Mar 24 19:50 rmatspipeline.cpython-36m-x86_64-linux-gnu.so
drwxr-xr-x 10 cbaribault hpcstaff    4096 Mar 24 19:49 rMATS_C
drwxr-xr-x  2 cbaribault hpcstaff    4096 Mar 24 19:48 lib
drwxr-xr-x  8 cbaribault hpcstaff    4096 Mar 24 19:15 bamtools
-rwxr-xr-x  1 cbaribault hpcstaff     158 Mar 24 18:07 build_rmats_cluster
drwxr-sr-x  3 cbaribault hpcstaff    4096 Mar 17 15:40 R
drwxr-sr-x  6 cbaribault hpcstaff    4096 Mar 17 15:38 PAIRADISE
-rwxr-xr-x  1 cbaribault hpcstaff    2748 Feb  8 07:04 build_rmats
-rwxrwxr-x  1 cbaribault hpcstaff     866 Feb  8 07:04 cp_with_prefix.py
drwxr-xr-x  2 cbaribault hpcstaff    4096 Feb  8 07:04 docs
-rwxrwxr-x  1 cbaribault hpcstaff     749 Feb  8 07:04 install_r_deps.R
-rw-r--r--  1 cbaribault hpcstaff    1781 Feb  8 07:04 LICENSE
-rw-r--r--  1 cbaribault hpcstaff     146 Feb  8 07:04 python_conda_requirements.txt
-rw-r--r--  1 cbaribault hpcstaff      14 Feb  8 07:04 r_conda_requirements.txt
-rw-r--r--  1 cbaribault hpcstaff   23763 Feb  8 07:04 README.md
drwxr-xr-x  2 cbaribault hpcstaff    4096 Feb  8 07:04 rMATS_P
-rwxrwxr-x  1 cbaribault hpcstaff   22703 Feb  8 07:04 rmats.py
drwxr-xr-x  2 cbaribault hpcstaff    4096 Feb  8 07:04 rMATS_R
-rwxr-xr-x  1 cbaribault hpcstaff     841 Feb  8 07:04 run_rmats
-rwxrwxr-x  1 cbaribault hpcstaff     711 Feb  8 07:04 setup_environment.sh

Best, CB

EricKutschera commented 3 years ago

The artifacts that are needed after the build for running rMATS can be seen in the Dockerfile. It copies the necessary files and then removes the build directory: https://github.com/Xinglab/rmats-turbo/blob/v4.1.1/Dockerfile#L30

I think just README.md is needed for user reference