RNA-seq, rule rseqc, infer_experiment.py and read_distribution.py are missing

acgtcoder commented 5 years ago

the two swcripts are missing, so this step does not work. Where are the two scripts?

skchronicles commented 5 years ago

@acgtcoder

I have a quick question. Are you running Pipeliner on Biowulf (NIH's HPC cluster)?

skchronicles commented 5 years ago

Follow-up

When running rule rseqc, a Biowulf enviroment module is loaded: rseqc/2.6.4. This module adds infer_experiment.py and read_distribution.py to your $PATH.

rule rseqc:
   input: 
    file1=join(workpath,bams_dir,"{name}.star_rg_added.sorted.dmark.bam"),
   output: 
    out1=join(workpath,rseqc_dir,"{name}.strand.info"),
    out4=join(workpath,rseqc_dir,"{name}.Rdist.info")
   params: 
    bedref=config['references'][pfamily]['BEDREF'],
    rseqcver=config['bin'][pfamily]['tool_versions']['RSEQCVER'],
    rname="pl:rseqc"
   shell: """
module load {params.rseqcver}
cd {rseqc_dir}
infer_experiment.py -r {params.bedref} -i {input.file1} > {output.out1}
read_distribution.py -i {input.file1} -r {params.bedref} > {output.out4}
"""

With that being said, I am unable to reproduce your error:

If you are attempting to run Pipeliner outside of the NIH, you will need to heavily modify the snakemake file and quite a few other resource files (which is not a trivial matter).

kopardev commented 5 years ago

the two swcripts are missing, so this step does not work. Where are the two scripts?

@acgtcoder these are rseqc subcommands and not scripts that we have authored:

acgtcoder commented 5 years ago

thanks a lot, Skyler! Got it.

On Thu, May 9, 2019 at 2:22 PM Skyler Kuhn notifications@github.com wrote:

@acgtcoder https://github.com/acgtcoder

I have a quick question. Are you running Pipeliner on Biowulf (NIH's HPC cluster)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CCBR/Pipeliner/issues/419#issuecomment-491013361, or mute the thread https://github.com/notifications/unsubscribe-auth/ALY5BBXQDJCWLRBHHH3OTNDPURTXTANCNFSM4HL4XONQ .

acgtcoder commented 5 years ago

another related question: the rule rnaseqc has never worked for me. I am trying to fix it, and I isolated the two relevant rules below, I tried it in dry run or make a dag. But snakemake always ignore rnaseqc when the two are included. But if I keep only one each time, either one would be executed. What is the problem? thanks a lot!

import os configfile: "run.json" workpath = config['project']['workpath']

samples=sorted(list(config['project']['units'].keys()))

from snakemake.utils import R from os.path import join configfile: "run.json"

from os import listdir

star_dir="STAR_files" bams_dir="bams" log_dir="logfiles" rseqc_dir="RSeQC" kraken_dir="kraken" preseq_dir="preseq" pfamily = 'rnaseq'

rule prernaseqc: input: expand(join(workpath,bams_dir,"{name}.star_rg_added.sorted.dmark.bam"), name=samples) output: out1=join(workpath,bams_dir,"files_to_rnaseqc.txt") priority: 2 params: rname='pl:prernaseqc',batch='--mem=4g --time=04:00:00' run: with open(output.out1, "w") as out: out.write("Sample ID\tBam file\tNotes\n") for f in input: out.write("%s\t" % f) out.write("%s\t" % f) out.write("%s\n" % f) out.close()

rule rnaseqc: input: join(workpath,bams_dir,"files_to_rnaseqc.txt") output: join(workpath,"STAR_QC") priority: 2 params: rname='pl:rnaseqc', batch='--mem=24g --time=48:00:00', bwaver=config['bin'][pfamily]['tool_versions']['BWAVER'], rrnalist=config['references'][pfamily]['RRNALIST'], rnaseqcver=config['bin'][pfamily]['RNASEQCJAR'], rseqcver=config['bin'][pfamily]['tool_versions']['RSEQCVER'],
gtffile=config['references'][pfamily]['GTFFILE'], genomefile=config['references'][pfamily]['GENOMEFILE']

shell: """ module load {params.bwaver} module load {params.rseqcver}

var="{params.rrnalist}" if [ $var == "-" ]; then java -Xmx48g -jar {params.rnaseqcver} -n 1000 -s {input} -t {params.gtffile} -r {params.genomefile} -o {output} else java -Xmx48g -jar {params.rnaseqcver} -n 1000 -s {input} -t {params.gtffile} -r {params.genomefile} -rRNA {params.rrnalist} -o {output} fi """

CCBR / Pipeliner

RNA-seq, rule rseqc, infer_experiment.py and read_distribution.py are missing #419

Follow-up