biowdl / RNA-seq

A BioWDL pipeline for processing RNA-seq data, starting with FASTQ files to produce expression measures and VCFs. Category:Multi-Sample
https://biowdl.github.io/RNA-seq/
MIT License
31 stars 16 forks source link

Unable to run v4.0.0 or develop (#d5f7d1f) locally #85

Open leipzig opened 3 years ago

leipzig commented 3 years ago

Hi I am interested in hosting this workflow on a Cromwell-enabled platform, but I've been seeing errors even trying it locally with both the stable and develop branches using these inputs derived from your internal tests

{
    "RNAseq.cpatHex": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/CPAT/Human_Hexamer.tsv",
    "RNAseq.dbsnpVCF": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/wgs2.vcf.gz",
    "RNAseq.hisat2Index": [
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.1.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.2.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.3.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.4.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.5.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.6.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.7.ht2",
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/hisat2/reference.8.ht2"
    ],
    "RNAseq.refflatFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.refflat",
    "RNAseq.strandedness": "None",
    "RNAseq.dbsnpVCFIndex": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/wgs2.vcf.gz.tbi",
    "RNAseq.cpatLogitModel": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/CPAT/Human_logitModel.RData",
    "RNAseq.referenceFasta": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.fasta",
    "RNAseq.variantCalling": true,
    "RNAseq.lncRNAdatabases": [
        "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.gtf"
    ],
    "RNAseq.lncRNAdetection": true,
    "RNAseq.dockerImagesFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/dockerImages.yml",
    "RNAseq.referenceGtfFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.gtf",
    "RNAseq.sampleConfigFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/samplesheets/Rna3PairedEnd.yml",
    "RNAseq.referenceFastaFai": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.fasta.fai",
    "RNAseq.referenceFastaDict": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/data/reference/reference.dict"
}

On v4.0.0 I see:

java -jar cromwell-59.jar run -i PairedEndHisat2.json RNA-seq.wdl
...
  File "/usr/local/bin/biowdl-input-converter", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/biowdl_input_converter/__init__.py", line 96, in main
    output_json = samplesheet_to_json(
  File "/usr/local/lib/python3.8/site-packages/biowdl_input_converter/__init__.py", line 77, in samplesheet_to_json
    raise NotImplementedError(
NotImplementedError: Unsupported extension: 

On develop I see:

Failed to import 'expression-quantification/multi-bam-quantify.wdl' (reason 1 of 1): Failed to process workflow definition 'MultiBamExpressionQuantification' (reason 1 of 1): Failed to process 'call collectColumns.CollectColumns as mergedStringtieFPKMs' (reason 1 of 1): The call supplied a value 'sumOnDuplicateId' that doesn't exist in the task (or sub-workflow)

Either of these might be easy to resolve but I'm not sure what direction I should take. Thanks!

rhpvorderman commented 3 years ago

Please use v4.0.0 that should be stable.

The error signifies that your SampleConfigFile does not have an extension, but you provided: RNAseq.sampleConfigFile": "https://raw.githubusercontent.com/biowdl/RNA-seq/develop/tests/samplesheets/Rna3PairedEnd.yml. So probably cromwell changes the extension during the download process. This should be visible in the full log.

Can you try downloading the samplesheet first and adding it as a file path instead of a URI?

leipzig commented 3 years ago

That works, or at least progresses to the same problem with chunked_scatter. It appears Cromwell renames https URI'ed files for security purposes

#using https://
└── -239497156
  └── 7793893292351066636
#using local filesystem
└── -153306629
  └── Rna3PairedEnd.yml
leipzig commented 3 years ago

The easiest way forward for me at this point might be a flag to allow explicit file types to be passed to https://github.com/biowdl/biowdl-input-converter rather than rely on autodetection. I'll try to cook up a PR.

leipzig commented 3 years ago

https://github.com/biowdl/biowdl-input-converter/pull/12 of course, files list in the yaml must be found somehow