CCBR / RENEE

A comprehensive quality-control and quantification RNA-seq pipeline
https://CCBR.github.io/RENEE/
MIT License
4 stars 4 forks source link

Error called in RSEM, TIN, BAM2BW: python 3.12 missing from user path #119

Closed slsevilla closed 7 months ago

slsevilla commented 8 months ago

Josh ran into an error when running RENEE via ccbrpipeliner. The same error appeared in three rules:

bam2bw_rnaseq_pe
rsem
tin

This is the error:

[sevillas2@cn4284 logfiles]$ cat /vf/users/CCBR/projects/ccbr1327/renee_20240811/logfiles/slurmfiles/21611055.21623081.rsem.name=Sample_O3.err
/var/spool/slurm/slurmd/job21623081/slurm_script: line 3: /usr/local/apps/snakemake/conda/envs/7.32.4/bin/python3.12: No such file or directory

This is confusing because RSEM doesn't load python but that's likely due to the error getting passed non-discriminately to other rules (snakemake issue). Looking for this path, it is actually missing, although other versions are available

[sevillas2@cn4284 logfiles]$ ls /usr/local/apps/snakemake/conda/envs/7.32.4/bin/python
python             python3            python3.1          python3.11         python3.11-config  python3-config

Not sure why this is a bug we're seeing now, randomly. I don't see anything in recent updates that would cause this change... thoughts @kelly-sovacool @kopardev

kelly-sovacool commented 8 months ago

This is bizarre. I would expect snakemake to use the python in our conda env because of this line: https://github.com/CCBR/RENEE/blob/54099ff929d9ee4e47af8a71012246a95836b7b9/bin/redirect#L37

I just launched a test run to see whether I can reproduce the error.

slsevilla commented 8 months ago

Agreed - I can't even figure out where the python call is coming from either. These tools don't actually call python 3.12...

kelly-sovacool commented 8 months ago

My test run completed successfully. This must be some sort of user-specific environment/path issue?

slsevilla commented 8 months ago

RENEE was run from the command line, using ccbrpipeliner default settings. Params for this run look ok:

RENEE   Mon Mar 11 11:14:12 EDT 2024
Running pipeline with the following parameters:
    b   /data/CCBR_Pipeliner,/gpfs/gsfs10/users/CCBR_Pipeliner,/vf/users/CCBR/rawdata/ccbr1327/psomagen_delivery/fastq_files,/data/CCBR/projects/ccbr1327/renee_20240811,/lscratch
    c   /data/CCBR/projects/ccbr1327/renee_20240811/.singularity
    e   slurm
    h   biowulf
    j   pl:renee
    o   /data/CCBR/projects/ccbr1327/renee_20240811
    t   /lscratch/$SLURM_JOBID
    w   --nowait
kelly-sovacool commented 8 months ago

Plan to troubleshoot:

  1. Ask Josh to re-run with ccbrpipeliner in case it was a transient biowulf problem. It's possible that the biowulf admins were in the middle of [un]installing/managing that shared env (/usr/local/apps/snakemake/conda/envs/7.32.4/bin/) when Josh's job ran, since everyone else has been able to run it successfully.
  2. If re-running doesn't work, ask for details about Josh's environment (e.g. .bashrc file).
  3. Considering explicitly calling python before tin.py here. It's weird that it tried to use the shared biowulf python instead of the python in the docker container.
TJoshMeyer commented 8 months ago

Alas, re-running didn't work. I provide details on the runs and my environment in the other Issue thread about this same bug, here: https://github.com/CCBR/RENEE/issues/120

kelly-sovacool commented 7 months ago

@TJoshMeyer let's continue the conversation about the error in tin here in this issue.

I am not sure why you get this error when others do not. But based on the error messages from your renee_20240318 run, I updated the docker container for rseqc in a development version of RENEE here: /data/CCBR_Pipeliner/Pipelines/RENEE/renee-dev-sovacool/

Can you try running this version? Here's an example command you can use and modify:

/data/CCBR_Pipeliner/Pipelines/RENEE/renee-dev-sovacool/bin/renee run \
    --input /data/CCBR_Pipeliner/Pipelines/RENEE/develop/.tests/*.R1.fastq.gz \
    --genome hg38_30 \
    --mode slurm \
    --output /data/$USER/renee_test_dev \
    --sif-cache /data/CCBR_Pipeliner/SIFS \
    &> /data/$USER/renee_shell.out.txt
TJoshMeyer commented 7 months ago

Hi Kelly,

I'm getting a 'Permission denied' warning when I try to run that version you linked above. See screenshot below.

image

-- Josh

kelly-sovacool commented 7 months ago

Oops, since you're not part of the CCBR_Pipeliner group on biowulf that location won't work. Let's try this one instead:

/data/sovacoolkl/renee-dev/bin/renee run \
    --input /data/sovacoolkl/renee-dev/.tests/*.R1.fastq.gz \
    --genome hg38_30 \
    --mode slurm \
    --output /data/$USER/renee_test_dev \
    --sif-cache /data/CCBR_Pipeliner/SIFS \
    &> /data/$USER/renee_shell.out.txt
TJoshMeyer commented 7 months ago

The latest test just completed reporting a failure. You can find its output folder here:

/data/CCBR/projects/ccbr1327/renee_20240402/

Let me know what I can do next to help continue this troubleshooting.

kelly-sovacool commented 7 months ago

from /data/CCBR/projects/ccbr1327/renee_20240402/logfiles/snakemake.log, it looks like renee actually did complete all of the snakemake jobs:

[Wed Apr  3 11:38:56 2024]
Finished job 0.
166 of 166 steps (100%) done

The error came later on from the OnSuccess step -- if the ccbrpipliner module wasn't loaded then the jobby script wasn't in the path, so it wasn't able to run:

run_jobby_on_snakemake_log logfiles/snakemake.log | tee logfiles/snakemake.log.jobby | cut -f2,3,18 > logfiles/snakemake.log.jobby.short
/usr/bin/bash: run_jobby_on_snakemake_log: command not found

But that's not critical, since the actual workflow did complete successfully.

If you run this:

module load ccbrpipeliner
run_jobby_on_snakemake_log logfiles/snakemake.log | tee logfiles/snakemake.log.jobby | cut -f2,3,18 > logfiles/snakemake.log.jobby.short

and then take a look at logfiles/snakemake.log.jobby.short, all of the jobs should have a status of "COMPLETED".

kelly-sovacool commented 7 months ago

So it seems this test run worked for you! The next question is whether it will work on the dataset you actually want to analyze.

TJoshMeyer commented 7 months ago

Thanks! This is great news. Actually, I already altered the command you gave me this morning to point at the real dataset and real output folder I wanted to use. So, if it completed successfully, I should find the RSEM counts I need to continue the analysis in NIDAP waiting for me.

I'll be sure to run the command above about the snakemake jobby script to try and get an all-clear, too. I will report back on the success of that and me finding the counts I need. Then, maybe we can put this thread to bed.

First, lunch. :)

Thanks again!

kelly-sovacool commented 7 months ago

I should find the RSEM counts I need to continue the analysis

likely one of the files in DEG_ALL/ should be what you're looking for:

ls DEG_ALL/RSEM*
DEG_ALL/RSEM.genes.expected_count.all_samples.txt
DEG_ALL/RSEM.genes.expected_counts.all_samples.reformatted.tsv
DEG_ALL/RSEM.genes.FPKM.all_samples.txt
DEG_ALL/RSEM.genes.TPM.all_samples.txt
DEG_ALL/RSEM.isoforms.expected_count.all_samples.txt
DEG_ALL/RSEM.isoforms.FPKM.all_samples.txt
DEG_ALL/RSEM.isoforms.TPM.all_samples.txt
TJoshMeyer commented 7 months ago

You are correct! The one that ends in *.reformatted.tsv is reformatted for ease-of-import to NIDAP and our downstream workflow there. I've already checked and that file looks properly-formatted and ready-to-go!

I should now be unblocked on CCBR-1327, thank you!

I'm not sure if there's anything else for us to do in this thread. If I want to run RENEE again, using the module load version of the workflow on Biowulf instead of your custom copy used in this latest run, do I need to change something? I realize there are some Josh-specific patterns to this bug, but don't know if we've ever pinpointed what about my settings needs to change to fix this bug in future runs.

Let me know if there's anything else I should do on my end to help fix this bug going forward. Otherwise, I think I am good to go on CCBR-1327 and will have initial results to the collaborator now by end-of-week. Thanks again!

kelly-sovacool commented 7 months ago

@TJoshMeyer I'm still not sure why only you encountered the bug, but updating the docker container did fix it. I opened a PR (#123) to incorporate this update, and it will be included in the next release version of RENEE. So for now, keep using the development version in /data/sovacoolkl/renee-dev, and once we cut a new release you'll be good to go back to using the ccbrpipeliner module.

TJoshMeyer commented 7 months ago

@kelly-sovacool , I've already done the downstream workflow on NIDAP for this dataset and everything looks great! Thanks again!