Closed acgtcoder closed 5 years ago
@acgtcoder
I have a quick question. Are you running Pipeliner on Biowulf (NIH's HPC cluster)?
When running rule rseqc
, a Biowulf enviroment module is loaded: rseqc/2.6.4
. This module adds infer_experiment.py
and read_distribution.py
to your $PATH
.
rule rseqc:
input:
file1=join(workpath,bams_dir,"{name}.star_rg_added.sorted.dmark.bam"),
output:
out1=join(workpath,rseqc_dir,"{name}.strand.info"),
out4=join(workpath,rseqc_dir,"{name}.Rdist.info")
params:
bedref=config['references'][pfamily]['BEDREF'],
rseqcver=config['bin'][pfamily]['tool_versions']['RSEQCVER'],
rname="pl:rseqc"
shell: """
module load {params.rseqcver}
cd {rseqc_dir}
infer_experiment.py -r {params.bedref} -i {input.file1} > {output.out1}
read_distribution.py -i {input.file1} -r {params.bedref} > {output.out4}
"""
With that being said, I am unable to reproduce your error:
If you are attempting to run Pipeliner outside of the NIH, you will need to heavily modify the snakemake file and quite a few other resource files (which is not a trivial matter).
the two swcripts are missing, so this step does not work. Where are the two scripts?
@acgtcoder these are rseqc subcommands and not scripts that we have authored:
thanks a lot, Skyler! Got it.
On Thu, May 9, 2019 at 2:22 PM Skyler Kuhn notifications@github.com wrote:
@acgtcoder https://github.com/acgtcoder
I have a quick question. Are you running Pipeliner on Biowulf (NIH's HPC cluster)?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/CCBR/Pipeliner/issues/419#issuecomment-491013361, or mute the thread https://github.com/notifications/unsubscribe-auth/ALY5BBXQDJCWLRBHHH3OTNDPURTXTANCNFSM4HL4XONQ .
another related question: the rule rnaseqc has never worked for me. I am trying to fix it, and I isolated the two relevant rules below, I tried it in dry run or make a dag. But snakemake always ignore rnaseqc when the two are included. But if I keep only one each time, either one would be executed. What is the problem? thanks a lot!
import os configfile: "run.json" workpath = config['project']['workpath']
samples=sorted(list(config['project']['units'].keys()))
from snakemake.utils import R from os.path import join configfile: "run.json"
from os import listdir
star_dir="STAR_files" bams_dir="bams" log_dir="logfiles" rseqc_dir="RSeQC" kraken_dir="kraken" preseq_dir="preseq" pfamily = 'rnaseq'
rule prernaseqc: input: expand(join(workpath,bams_dir,"{name}.star_rg_added.sorted.dmark.bam"), name=samples) output: out1=join(workpath,bams_dir,"files_to_rnaseqc.txt") priority: 2 params: rname='pl:prernaseqc',batch='--mem=4g --time=04:00:00' run: with open(output.out1, "w") as out: out.write("Sample ID\tBam file\tNotes\n") for f in input: out.write("%s\t" % f) out.write("%s\t" % f) out.write("%s\n" % f) out.close()
rule rnaseqc:
input:
join(workpath,bams_dir,"files_to_rnaseqc.txt")
output:
join(workpath,"STAR_QC")
priority: 2
params:
rname='pl:rnaseqc',
batch='--mem=24g --time=48:00:00',
bwaver=config['bin'][pfamily]['tool_versions']['BWAVER'],
rrnalist=config['references'][pfamily]['RRNALIST'],
rnaseqcver=config['bin'][pfamily]['RNASEQCJAR'],
rseqcver=config['bin'][pfamily]['tool_versions']['RSEQCVER'],
gtffile=config['references'][pfamily]['GTFFILE'],
genomefile=config['references'][pfamily]['GENOMEFILE']
shell: """ module load {params.bwaver} module load {params.rseqcver}
var="{params.rrnalist}" if [ $var == "-" ]; then java -Xmx48g -jar {params.rnaseqcver} -n 1000 -s {input} -t {params.gtffile} -r {params.genomefile} -o {output} else java -Xmx48g -jar {params.rnaseqcver} -n 1000 -s {input} -t {params.gtffile} -r {params.genomefile} -rRNA {params.rrnalist} -o {output} fi """
the two swcripts are missing, so this step does not work. Where are the two scripts?