Azure / batch-shipyard

Simplify HPC and Batch workloads on Azure
MIT License
277 stars 121 forks source link

Snakemake Input File Missing with Batch-Shipyard #265

Closed markpearl closed 5 years ago

markpearl commented 5 years ago

Hi There,

I've tried using Batch-Shipyard to run Snakemake as per the approach: https://github.com/Azure/azure-hpc/blob/master/LifeSciences/SnakemakeBurst/docs/example.md

But some of the rules at the beginning of the Snakemake file seem to be causing a permission issues when creating directories when running using Batch-Shipyard:

Provided is the error I'm getting: image

I've attached my snakefile and yaml files for the batch-shipyard configuration. Some help would be really appreciated on this! I've followed the installation guide very carefully so I think it's something specific with Snakemake. What's odd is that it works no problem when installing Snakemake and the dependencies in Conda, but it doesn't work when trying to run in Batch-Shipyard.

Snakefile for the RNA-Seq analysis pipeline using test data from zebrafish

You should not need to edit this file unless you are changing the programs in the pipeline

configfile: "config_zebrafish.yaml" SAMPLES = config['samples']

R1_suffix=config['input_file_R1_suffix'] R2_suffix=config['input_file_R2_suffix'] genome_fasta_file = config['genome_fasta_file'] genome_index_base = config['genome_index_base'] merged_transcripts_file=config['merged_transcripts_file']

rule trim_and_qc_all: input: html=expand("{sample}_R1.trimmed_paired_fastqc.html", sample=SAMPLES)

rule trim_reads: input: R1_reads="data/{sample}" + R1_suffix, R2_reads="data/{sample}" + R2_suffix
output: "1_trimmed_reads/{sample}_R1.trimmed_paired.fastq", "1_trimmed_reads/{sample}_R1.trimmed_unpaired.fastq", "1_trimmed_reads/{sample}_R2.trimmed_paired.fastq", "1_trimmed_reads/{sample}_R2.trimmed_unpaired.fastq"
threads: config['threads'] params: run_params=config['trimmomatic_params'] shell: "echo -e \"#!/usr/bin/env bash\ncd $FILESHARE;\n trimmomatic PE -threads {threads} {input.R1_reads} {input.R2_reads} {output}\" > $FILESHARE/jobrun.sh ;\n $SHIPYARD/shipyard jobs add --configdir $FILESHARE/azurebatch --tail stderr.txt\n"

alfpark commented 5 years ago

At first blush, it looks like you're missing some input files. I would check that these files exist. Shipyard doesn't get involved until the "jobs add" invocation much lower and no Shipyard output exists in your screenshot. So I don't think this is related to Shipyard but something with Snakemake. I would post an issue in the azure-hpc repo to get some Snakemake-specific help.

markpearl commented 5 years ago

Hi Fred,

I've gained a lot of progress on this but I seem to be running into an issue for loading the docker image on the compute node when the pool is created.

I've created the following issue: https://github.com/Azure/batch-shipyard/issues/268

Your help would be greatly appreciated!

Thanks,

Mark Pearl

On Fri, Mar 8, 2019 at 4:37 PM Fred Park notifications@github.com wrote:

At first blush, it looks like you're missing some input files. I would check that these files exist. Shipyard doesn't get involved until the "jobs add" invocation much lower and no Shipyard output exists in your screenshot. So I don't think this is related to Shipyard but something with Snakemake. I would post an issue in the azure-hpc repo.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Azure/batch-shipyard/issues/265#issuecomment-471084644, or mute the thread https://github.com/notifications/unsubscribe-auth/AqWy_9dTM82E_2Z41eXspuQEy7-YTxVMks5vUtgmgaJpZM4bkR1N .