Open tonyyzy opened 5 years ago
Hey, @tonyyzy,
Thanks for posting such a thorough report. Excuse me if its unreasonable to ask, but would it be possible for you to attach hisat2_align.cwl
and folder.cwl
files as well ?
Hi @kkarolis
hisat2_align.cwl
was attached below the workflow part. I've made the title bold so it's clearer. I also included a txt version as attachments.
hisat2_align.cwl.txt
folder.cwl.txt
just some thoughts...from the log above, at 2019-02-20 15:51:31
, the job was submitted and the javascript portion was resolved correctly. At 2019-02-20 15:56:27
, when the hisat job was completed, cwl decided to run the javascript (again?) and got no output. Might be a bug somewhere?
folder.cwl
#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: ExpressionTool
requirements:
InlineJavascriptRequirement: {}
inputs:
item: Any
name: string
outputs:
out: Directory
expression: "${
if (inputs.item.class == 'File'){
var arr = [inputs.item];
}
else {
var arr = inputs.item;
}
return {
'out': {
'class': 'Directory',
'basename': inputs.name,
'listing': arr
}
}
}"
@kkarolis , Sorry this is the minimal example I could create...I can provide the input yml files and point you to the test data if that helps.
I understand that parallel is suppose to be an experimental feature for cwltool
and toil
might be more suitable for this purpose. However, for a single node system, I found the toil
's parallel execution a bit overkill (and toil
might face the same javascript problem, but I can't recreate the issue everytime). So yeah, I think cwltool
's parallel feature is really useful for a single node, high core count system.
Totally missed the algo file, doh!
re: the inputs, if it's not a huge problem for you that would surely help!
It seems like a race condition on nodejs standard output stream handling, i'm looking into it, but will have to learn the codebase so this will take a bit.
By the way, have you tried to rerun the same thing on newest version of cwltool ?
Partial input object using public data:
threads: 12
subject_name1: SRR3584106
subject_name2: SRR3584107
subject_name3: SRR3584108
subject_name4: SRR3584109
subject_name5: SRR3584110
subject_name6: SRR3584111
annotation:
class: File
location: http://ftp.ensemblgenomes.org/pub/protists/release-42/gff3/protists_alveolata1_collection/plasmodium_yoelii_gca_900002395/Plasmodium_yoelii_gca_900002395.PYYM01.42.gff3.gz # ?
fastq1:
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/006/SRR3584106/SRR3584106_1.fastq.gz
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/006/SRR3584106/SRR3584106_2.fastq.gz
fastq2:
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/007/SRR3584107/SRR3584107_1.fastq.gz
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/007/SRR3584107/SRR3584107_2.fastq.gz
fastq3:
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/008/SRR3584108/SRR3584108_1.fastq.gz
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/008/SRR3584108/SRR3584108_2.fastq.gz
fastq4:
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/009/SRR3584109/SRR3584109_1.fastq.gz
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/009/SRR3584109/SRR3584109_2.fastq.gz
fastq5:
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/000/SRR3584110/SRR3584110_1.fastq.gz
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/000/SRR3584110/SRR3584110_2.fastq.gz
fastq6:
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/001/SRR3584111/SRR3584111_1.fastq.gz
- class: File
location: http://ftp.sra.ebi.ac.uk/vol1/fastq/SRR358/001/SRR3584111/SRR3584111_2.fastq.gz
Michael, thanks for providing a partial input file!
@kkarolis , it takes a few steps to recreate the necessary test files, Plasmodium has a really small genome so hopefully it doesn't take long.
Download the genome file:
ftp://ftp.ensemblgenomes.org/pub/protists/release-42/fasta/protists_alveolata1_collection/plasmodium_yoelii_gca_900002395/dna/Plasmodium_yoelii_gca_900002395.PYYM01.dna.toplevel.fa.gz
Hisat2 doesn't take gzip genome for the indexing step, so you would need to unzip the fasta file.
hisat2 indexing
Run this cwl script to create the index
Note, specify --outdir
so the indexes would be in a subdirectory
e.g. cwl-runner --outdir=./HISAT2Index ./hisat2_build.cwl ./hisat2_build.yml
hisat2_build.cwl
#!usr/bin/env cwl-runner
cwlVersion: v1.0
class: CommandLineTool
baseCommand: hisat2-build
hints:
DockerRequirement:
dockerPull: quay.io/biocontainers/hisat2:2.1.0--py27h2d50403_2
stdout: log.txt
inputs:
reference:
type: File
inputBinding:
position: 1
prefix: -f
basename:
type: string
inputBinding:
position: 2
threads:
type: int
inputBinding:
prefix: -p
outputs:
ht:
type: File[]
outputBinding:
glob: "*"
log:
type: stdout
hisat2_build.yml replace the path to the genome file
threads: 2
reference:
class: File
path: /data/rnaseq/GenomeIndex/Plasmodium_Yoelii/Plasmodium_yoelii_gca_900002395.PYYM01.dna.toplevel.fa
basename: "P.yoelii"
input yml for the workflow You can follow Michael's input for downloading files at runtime, but I'm not sure if this error still pop up if the finish times for steps are more separated. In case you need to download the 12 fastq files, they can be found here, https://www.ebi.ac.uk/ena/data/view/PRJNA322665 The top six entries are what you need and two fastq each.
workflow.yml
replace the path of genomeDir
to the path of the directory where the indexes are stored in the previous step.
replace the path to annotation
to the path of the annotation file which can be downloaded here (NB. use this link to the gtf file instead of Michael's, which is a gff3 file):
ftp://ftp.ensemblgenomes.org/pub/protists/release-42/gtf/protists_alveolata1_collection/plasmodium_yoelii_gca_900002395/Plasmodium_yoelii_gca_900002395.PYYM01.42.gtf.gz
threads: 12
genomeDir:
class: Directory
path: /data/rnaseq/GenomeIndex/Plasmodium_Yoelii/HISAT2Index
annotation:
class: File
path: /data/rnaseq/GenomeIndex/Plasmodium_Yoelii/Plasmodium_yoelii_gca_900002395.PYYM01.42.gtf
subject_name1: SRR3584106
subject_name2: SRR3584107
subject_name3: SRR3584108
subject_name4: SRR3584109
subject_name5: SRR3584110
subject_name6: SRR3584111
fastq1:
- {class: File, path: /data/rnaseq/test/plas/SRR3584106_1.fastq.gz}
- {class: File, path: /data/rnaseq/test/plas/SRR3584106_2.fastq.gz}
fastq2:
- {class: File, path: /data/rnaseq/test/plas/SRR3584107_1.fastq.gz}
- {class: File, path: /data/rnaseq/test/plas/SRR3584107_2.fastq.gz}
fastq3:
- {class: File, path: /data/rnaseq/test/plas/SRR3584108_1.fastq.gz}
- {class: File, path: /data/rnaseq/test/plas/SRR3584108_2.fastq.gz}
fastq4:
- {class: File, path: /data/rnaseq/test/plas/SRR3584109_1.fastq.gz}
- {class: File, path: /data/rnaseq/test/plas/SRR3584109_2.fastq.gz}
fastq5:
- {class: File, path: /data/rnaseq/test/plas/SRR3584110_1.fastq.gz}
- {class: File, path: /data/rnaseq/test/plas/SRR3584110_2.fastq.gz}
fastq6:
- {class: File, path: /data/rnaseq/test/plas/SRR3584111_1.fastq.gz}
- {class: File, path: /data/rnaseq/test/plas/SRR3584111_2.fastq.gz}
I have tested on a newest version of cwl installed with pip and --user
option. Will contact sys admin to upgrade sitepackage to the newest version but I don't think it makes a difference?
Hey @tonyyzy ,
Sorry for taking so long, but I did not manage to reproduce the issue your having. Tried rerunning with the provided test data several times on the same version you were using and all the times the process succeeded.
Hi @kkarolis Sorry it took a while to reply to your message. I'm waiting to have the computational resources again. Then I will test and let you know in a couple days. Thanks for your patience and help!
Hi @kkarolis Unfortunately, the error still persists. I tried the newest cwltool as well (20190228) and got the same error message. I do not have a local installation of nodejs, so the javascript part would be executed in a docker container. Would this be the cause of error?
Expected Behavior
Jobs execute in parallel
Actual Behavior
Jobs executed in parallel but an expression evaluation error caused successful workflow steps to fail. This error tends to occur when multiple steps finished together.
Workflow Code
workflow.cwl
hisat2_align.cwl
Full Traceback
Full traceback too long for issue, a txt log file is attacked below
log-6.txt
Your Environment