Xinglab / rmats-turbo

Other
219 stars 53 forks source link

rmats.py doesn't work in HPC conda environment #427

Open desmodus1984 opened 1 month ago

desmodus1984 commented 1 month ago

Hi,

I am trying to run rmats in the HPC. I installed rmats in a new rmats environment in conda. I activated conda and the environment, but somehow I just can't run it. I got several errors.

I first ran this script:

conda init
conda activate rmats

rmats.py --b1 ctrl.H.txt --b2 ctrl.Low.txt \
        --gtf /home/juaguila/Bombus/genomic.gtf \
        -t paired --readLength 50 --nthread 12 --od ctrlHL --tmp ctrl-H-L

and I got this error:

CondaError: Run 'conda init' before 'conda activate'

/scratch/slurmd/job1374815/slurm_script: line 14: rmats.py: command not found

When I added the conda init, I got this:

no change     /home/applications/miniconda3/23.5.2/condabin/conda
no change     /home/applications/miniconda3/23.5.2/bin/conda
no change     /home/applications/miniconda3/23.5.2/bin/conda-env
no change     /home/applications/miniconda3/23.5.2/bin/activate
no change     /home/applications/miniconda3/23.5.2/bin/deactivate
no change     /home/applications/miniconda3/23.5.2/etc/profile.d/conda.sh
no change     /home/applications/miniconda3/23.5.2/etc/fish/conf.d/conda.fish
no change     /home/applications/miniconda3/23.5.2/shell/condabin/Conda.psm1
no change     /home/applications/miniconda3/23.5.2/shell/condabin/conda-hook.ps1
no change     /home/applications/miniconda3/23.5.2/lib/python3.11/site-packages/xontrib/conda.xsh
no change     /home/applications/miniconda3/23.5.2/etc/profile.d/conda.csh
no change     /home/juaguila/.bashrc
No action taken.

CondaError: Run 'conda init' before 'conda activate'

/scratch/slurmd/job1374816/slurm_script: line 15: rmats.py: command not found

Then I tried loading conda itself, though it was loaded when I started the ssh session:

module load miniconda3-23.5.2

source activate rmats

rmats.py --b1 ctrl.H.txt --b2 ctrl.Low.txt \
        --gtf /home/juaguila/Bombus/genomic.gtf \
        -t paired --readLength 50 --nthread 12 --od ctrlHL --tmp ctrl-H-L

and now I got this error:

no change     /home/applications/miniconda3/23.5.2/condabin/conda
no change     /home/applications/miniconda3/23.5.2/bin/conda
no change     /home/applications/miniconda3/23.5.2/bin/conda-env
no change     /home/applications/miniconda3/23.5.2/bin/activate
no change     /home/applications/miniconda3/23.5.2/bin/deactivate
no change     /home/applications/miniconda3/23.5.2/etc/profile.d/conda.sh
no change     /home/applications/miniconda3/23.5.2/etc/fish/conf.d/conda.fish
no change     /home/applications/miniconda3/23.5.2/shell/condabin/Conda.psm1
no change     /home/applications/miniconda3/23.5.2/shell/condabin/conda-hook.ps1
no change     /home/applications/miniconda3/23.5.2/lib/python3.11/site-packages/xontrib/conda.xsh
no change     /home/applications/miniconda3/23.5.2/etc/profile.d/conda.csh
no change     /home/juaguila/.bashrc
No action taken.

CondaError: Run 'conda init' before 'conda activate'

Traceback (most recent call last):
  File "/home/juaguila/.conda/envs/rmats/bin/rmats.py", line 19, in <module>
    from rmatspipeline import run_pipe
ModuleNotFoundError: No module named 'rmatspipeline'

I don't understand what is going on because, I just loaded the environment in the session and then I ran rmats.py -h, and I got back all the usage information

rmats.py -h
usage: rmats.py [options]

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --gtf GTF             An annotation of genes and transcripts in GTF format
  --b1 B1               A text file containing a comma separated list of the BAM files for sample_1.
                        (Only if using BAM)
  --b2 B2               A text file containing a comma separated list of the BAM files for sample_2.
                        (Only if using BAM)
  --s1 S1               A text file containing a comma separated list of the FASTQ files for sample_1.
                        If using paired reads the format is ":" to separate pairs and "," to separate
                        replicates. (Only if using fastq)
  --s2 S2               A text file containing a comma separated list of the FASTQ files for sample_2.
                        If using paired reads the format is ":" to separate pairs and "," to separate
                        replicates. (Only if using fastq)
  --od OD               The directory for final output from the post step
  --tmp TMP             The directory for intermediate output such as ".rmats" files from the prep step
  -t {paired,single}    Type of read used in the analysis: either "paired" for paired-end data or
                        "single" for single-end data. Default: paired
  --libType {fr-unstranded,fr-firststrand,fr-secondstrand}
                        Library type. Use fr-firststrand or fr-secondstrand for strand-specific data.
                        Only relevant to the prep step, not the post step. Default: fr-unstranded
  --readLength READLENGTH
                        The length of each read. Required parameter, with the value set according to
                        the RNA-seq read length
  --variable-read-length
                        Allow reads with lengths that differ from --readLength to be processed.
                        --readLength will still be used to determine IncFormLen and SkipFormLen
  --anchorLength ANCHORLENGTH
                        The "anchor length" or "overhang length" used when counting the number of reads
                        spanning splice junctions. A minimum number of "anchor length" nucleotides must
                        be mapped to each end of a given splice junction. The minimum value is 1 and
                        the default value is set to 1 to make use of all possible splice junction
                        reads.
  --tophatAnchor TOPHATANCHOR
                        The "anchor length" or "overhang length" used in the aligner. At least "anchor
                        length" nucleotides must be mapped to each end of a given splice junction. The
                        default is 1. (Only if using fastq)
  --bi BINDEX           The directory name of the STAR binary indices (name of the directory that
                        contains the suffix array file). (Only if using fastq)
  --nthread NTHREAD     The number of threads. The optimal number of threads should be equal to the
                        number of CPU cores. Default: 1
  --tstat TSTAT         The number of threads for the statistical model. If not set then the value of
                        --nthread is used
  --cstat CSTAT         The cutoff splicing difference. The cutoff used in the null hypothesis test for
                        differential alternative splicing. The default is 0.0001 for 0.01% difference.
                        Valid: 0 <= cutoff < 1. Does not apply to the paired stats model
  --task {prep,post,both,inte,stat}
                        Specify which step(s) of rMATS-turbo to run. Default: both. prep: preprocess
                        BAM files and generate .rmats files. post: load .rmats files into memory,
                        detect and count alternative splicing events, and calculate P value (if not
                        --statoff). both: prep + post. inte (integrity): check that the BAM filenames
                        recorded by the prep task(s) match the BAM filenames for the current command
                        line. stat: run statistical test on existing output files
  --statoff             Skip the statistical analysis
  --paired-stats        Use the paired stats model
  --darts-model         Use the DARTS statistical model
  --darts-cutoff DARTS_CUTOFF
                        The cutoff of delta-PSI in the DARTS model. The output posterior probability is
                        P(abs(delta_psi) > cutoff). The default is 0.05
  --novelSS             Enable detection of novel splice sites (unannotated splice sites). Default is
                        no detection of novel splice sites
  --mil MIL             Minimum Intron Length. Only impacts --novelSS behavior. Default: 50
  --mel MEL             Maximum Exon Length. Only impacts --novelSS behavior. Default: 500
  --allow-clipping      Allow alignments with soft or hard clipping to be used
  --fixed-event-set FIXED_EVENT_SET
                        A directory containing fromGTF.[AS].txt files to be used instead of detecting a
                        new set of events
  --individual-counts   Output individualCounts.[AS_Event].txt files and add the individual count
                        columns to [AS_Event].MATS.JC.txt

Any reason why rmats doesn't work though it is working in the session but not when called as a job for the HPC.

Thanks;

EricKutschera commented 3 weeks ago

conda init should write some lines to your .bashrc. Then when you open a new shell it should source your .bashrc and run that conda code. If you are running through a script it may not be using the code in your .bashrc. A possible solution is to copy the code from your .bashrc to the script that you're seeing the error in. Here's a similar post: https://github.com/Xinglab/rmats-turbo/issues/423#issuecomment-2275700854

desmodus1984 commented 3 weeks ago

My question is that I still the error, when running conda without conda init, doing what I was used to which was

  1. activating the miniconda module
  2. source activating the environment
  3. and lastly running rmats.py

This is error is unique to rmats It hasn't happened with any other software.

EricKutschera commented 3 weeks ago

You mean ModuleNotFoundError: No module named 'rmatspipeline'? That could be due to a different version of python being used. See this post: https://github.com/Xinglab/rmats-turbo/issues/67

Based on the stack trace it found /home/juaguila/.conda/envs/rmats/bin/rmats.py. There should be a file like /home/juaguila/.conda/envs/rmats/rMATS/rmatspipeline.*.so. That .so file might have the version of python in the name like 39 for python3.9