broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
996 stars 361 forks source link

get bcbio running on PAPI #3724

Closed danbills closed 5 years ago

gemmalam commented 6 years ago

Need link to 3 specific workflows @geoffjentry . Belongs in different Q4 Milestone @ruchim

geoffjentry commented 6 years ago

Specifically looking at the following: gvcf_joint, prealign, rnaseq, somatic, svcall from https://github.com/bcbio/test_bcbio_cwl

Note that there's a version of somatic with GS inputs available in the gcp subdir which might make testing smoother for that one. I've seen prealign work ok on PAPI2 but haven't had luck on anything else.

Horneth commented 6 years ago

I'm seeing the detect_sv tool in the somatic workflow fail with this error (from stderr):

[2018-11-04T19:02:19.372170Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR] Failed to complete master workflow, error code: 1
[2018-11-04T19:02:19.372320Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR] errorMessage:
[2018-11-04T19:02:19.373700Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR] Unhandled Exception in TaskRunner-Thread-masterWorkflow
[2018-11-04T19:02:19.373750Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR] Traceback (most recent call last):
[2018-11-04T19:02:19.373786Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]   File "/usr/local/share/bcbio-nextgen/anaconda/share/manta-1.4.0-1/lib/python/pyflow/pyflow.py", line 1069, in run
[2018-11-04T19:02:19.373812Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]     (retval, retmsg) = self._run()
[2018-11-04T19:02:19.373833Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]   File "/usr/local/share/bcbio-nextgen/anaconda/share/manta-1.4.0-1/lib/python/pyflow/pyflow.py", line 1121, in _run
[2018-11-04T19:02:19.373871Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]     self.workflow.workflow()
[2018-11-04T19:02:19.373894Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]   File "/usr/local/share/bcbio-nextgen/anaconda/share/manta-1.4.0-1/lib/python/mantaWorkflow.py", line 895, in workflow
[2018-11-04T19:02:19.373930Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]     graphTasks = runLocusGraph(self,dependencies=graphTaskDependencies)
[2018-11-04T19:02:19.373954Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]   File "/usr/local/share/bcbio-nextgen/anaconda/share/manta-1.4.0-1/lib/python/mantaWorkflow.py", line 296, in runLocusGraph
[2018-11-04T19:02:19.373978Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]     mergeTask = self.addTask(preJoin(taskPrefix,"mergeLocusGraph"),mergeCmd,dependencies=tmpGraphFileListTask,memMb=self.params.mergeMemMb)
[2018-11-04T19:02:19.374002Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]   File "/usr/local/share/bcbio-nextgen/anaconda/share/manta-1.4.0-1/lib/python/pyflow/pyflow.py", line 3689, in addTask
[2018-11-04T19:02:19.374023Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR]     raise Exception("Task memory requirement exceeds full available resources")
[2018-11-04T19:02:19.374046Z] [2a8fea138cb7] [72_1] [WorkflowRunner] [ERROR] Exception: Task memory requirement exceeds full available resources

The cwl requests 4GB of memory for this task, which I verified Cromwell did request from PAPI as well:

resources:
      projectId: broad-dsde-cromwell-perf
      regions: []
      virtualMachine:
        accelerators: []
        bootDiskSizeGb: 21
        bootImage: projects/cos-cloud/global/images/family/cos-stable
        cpuPlatform: ''
        disks:
        - name: local-disk
          sizeGb: 10
          sourceImage: ''
          type: pd-ssd
        labels:
          cromwell-sub-workflow-name: wf-svcall-cwl
          cromwell-workflow-id: cromwell-0344f62e-809d-48d4-8e9a-ede11fe5dd5c
          wdl-call-alias: detect-sv
          wdl-task-name: detect-sv-cwl
        machineType: custom-2-4096

@chapmanb I was curious if you've seen this before ? I'm modifying the CWL to ask for a bit more memory but I'm wondering if there's something else that Cromwell is not doing right

chapmanb commented 6 years ago

Thanks much for testing this out. I'm happy to help with whatever I can for supporting this. I haven't seen this previously and am kind of surprised that it hits memory issues. This is a tiny test dataset so I'm not sure why it hits a 4Gb limit. It shouldn't use much memory at all.The error comes from within pyflow, which is an internal workflow system manta uses for running:

https://github.com/Illumina/pyflow/blob/aac143d6b95ddfdc1dad7b2a7226b03a41379b58/pyflow/src/pyflow.py#L3660

I wish it told us the memory it thought the system had and what it wants so we'd have more idea of what is happening.

I don't think Cromwell is doing anything wrong here and asking for more memory would be the first thing I'd try as well. Let me know if this doesn't fix and we can try to explore more. Thanks again.

Horneth commented 6 years ago

Sounds good thanks ! I'll update here once I have more info. In similar news I was able to run gvcf_joint to completion using the same inputs as in the gcp/somatic workflow (in the gs://bcbiodata/test_bcbio_cwl bucket)

chapmanb commented 6 years ago

Nice one, glad you're having success with the gvcf_joint workflow. That has more parts and the svcaller one was meant to be simpler, so having that going is a good indication you've got most of the Cromwell parts in place. Really nice, I'm excited about having this going on GCP. Thanks again for all the work.

Horneth commented 6 years ago

@chapmanb Somatic completed successfully by bumping the memory (I doubled it to 8GB) :) I have another question about the rnaseq pipeline if you don't mind. I'm hitting this error on the pipeline_summary task:

/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/cyvcf2/__init__.py:1: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .cyvcf2 import (VCF, Variant, Writer, r_ as r_unphased, par_relatedness,
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/_libs/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (hashtable as _hashtable,
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/dtypes/common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos, lib
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/util/hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import hashing, tslib
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/indexes/base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (lib, index as libindex, tslib as libts,
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/tseries/offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.tslibs.offsets as liboffsets
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos as libalgos, ops as libops
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/indexes/interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs.interval import (
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import internals as libinternals
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/sparse/array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.sparse as splib
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.window as _window
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/groupby/groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import (lib, reduction,
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/core/reshape/reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos as _algos, reshape as _reshape
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  import pandas._libs.parsers as parsers
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/pandas/io/pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from pandas._libs import algos, lib, writers as libwriters
/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/gffutils/interface.py:161: UserWarning: It appears that this database has not had the ANALYZE sqlite3 command run on it. Doing so can dramatically speed up queries, and is done by default for databases created with gffutils >0.8.7.1 (this database was created with version 0.8.2) Consider calling the analyze() method of this object.
  "method of this object." % self.version)
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 223, in <module>
    runfn.process(kwargs["args"])
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/runfn.py", line 58, in process
    out = fn(fnargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 52, in wrapper
    return apply(f, *args, **kwargs)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 208, in pipeline_summary
    return qcsummary.pipeline_summary(*args)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 70, in pipeline_summary
    data["summary"] = _run_qc_tools(work_bam, work_data)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/qcsummary.py", line 162, in _run_qc_tools
    out = qc_fn(bam_file, data, cur_qc_dir)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/qc/qualimap.py", line 347, in run_rnaseq
    metrics = _parse_metrics(metrics)
  File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/qc/qualimap.py", line 210, in _parse_metrics
    out.update({name: float(metrics[name])})
TypeError: float() argument must be a string or a number

This is what the command Cromwell generated looks like:

'bcbio_nextgen.py' 'runfn' 'pipeline_summary' 'cwl' 'sentinel_runtime=cores,2,ram,4096' 'sentinel_parallel=multi-parallel' 'sentinel_outputs=qcout_rec:summary__qc;summary__metrics;resources;description;reference__fasta__base;config__algorithm__coverage_interval;genome_build;genome_resources__rnaseq__transcripts;config__algorithm__tools_off;config__algorithm__qc;analysis;config__algorithm__tools_on;align_bam' 'sentinel_inputs=qc_rec:record' 'run_number=0'

And the cwl.inputs.json:

{
  "qc_rec": {
    "genome_build": "hg19",
    "config__algorithm__tools_on": [],
    "align_bam": {
      "nameext": ".bam",
      "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/call-process_alignment/shard-0/align/Test1/Test1-sort.bam",
      "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/call-process_alignment/shard-0/align/Test1/Test1-sort.bam",
      "size": 4028452,
      "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/call-process_alignment/shard-0/align/Test1",
      "secondaryFiles": [
        {
          "nameext": ".bai",
          "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/call-process_alignment/shard-0/align/Test1/Test1-sort.bam.bai",
          "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/call-process_alignment/shard-0/align/Test1/Test1-sort.bam.bai",
          "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/call-process_alignment/shard-0/align/Test1",
          "secondaryFiles": [],
          "basename": "Test1-sort.bam.bai",
          "class": "File",
          "nameroot": "Test1-sort.bam"
        }
      ],
      "basename": "Test1-sort.bam",
      "class": "File",
      "nameroot": "Test1-sort"
    },
    "description": "Test1",
    "config__algorithm__tools_off": [],
    "genome_resources__rnaseq__transcripts": {
      "nameext": ".gtf",
      "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/rnaseq/ref-transcripts.gtf",
      "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/rnaseq/ref-transcripts.gtf",
      "size": 15149,
      "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/rnaseq",
      "secondaryFiles": [
        {
          "nameext": ".db",
          "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/rnaseq/ref-transcripts.gtf.db",
          "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/rnaseq/ref-transcripts.gtf.db",
          "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/rnaseq",
          "secondaryFiles": [],
          "basename": "ref-transcripts.gtf.db",
          "class": "File",
          "nameroot": "ref-transcripts.gtf"
        }
      ],
      "basename": "ref-transcripts.gtf",
      "class": "File",
      "nameroot": "ref-transcripts"
    },
    "reference__fasta__base": {
      "nameext": ".fa",
      "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
      "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa",
      "size": 37196,
      "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq",
      "secondaryFiles": [
        {
          "nameext": ".fai",
          "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
          "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.fa.fai",
          "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq",
          "secondaryFiles": [],
          "basename": "hg19.fa.fai",
          "class": "File",
          "nameroot": "hg19.fa"
        },
        {
          "nameext": ".dict",
          "location": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
          "path": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq/hg19.dict",
          "dirname": "/cromwell_root/tj-bcbio-papi/main-rnaseq.cwl/6c75cc7c-5515-45e0-9e5b-9a1b9e6fd2e1/call-qc_to_rec/bcbiodata/test_bcbio_cwl/testdata/genomes/hg19/seq",
          "secondaryFiles": [],
          "basename": "hg19.dict",
          "class": "File",
          "nameroot": "hg19"
        }
      ],
      "basename": "hg19.fa",
      "class": "File",
      "nameroot": "hg19"
    },
    "analysis": "RNA-seq",
    "resources": "{\"default\":{\"cores\":1,\"jvm_opts\":[\"-Xms1000m\",\"-Xmx2048m\"],\"memory\":\"2048M\"}}",
    "config__algorithm__qc": [
      "qualimap_rnaseq"
    ],
    "config__algorithm__coverage_interval": null
  }
}

The only thing maybe off that I see is the config__algorithm__coverage_interval (at the bottom of the json) being null ? Is this something that you'd expect not to be null and could throw off the tool ?

chapmanb commented 6 years ago

Sorry about this. That's a bug in the qualimap parsing in bcbio that we've fixed (https://github.com/bcbio/bcbio-nextgen/commit/e15f787f984da3e5d727733f2a1d7c58c50c6be0) but hasn't yet been rolled into the Docker container. We're planning a release tomorrow so I can push a new Docker container as well which should fix the problem.

So I don't think this is a Cromwell issue but a bug on the bcbio side and if other workflows are good I'd skip it for now. Thanks again for all this testing.

Horneth commented 6 years ago

No worries, thanks for the update, I'll skip this workflow for now then :)

geoffjentry commented 6 years ago

@rebrown1395 this isn't done?