broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
995 stars 360 forks source link

final_workflow_outputs_dir ignored for CWL workflow run in server mode #5105

Closed dpfoose closed 2 years ago

dpfoose commented 5 years ago

On Jira as BA-5890.

Environment Ubuntu 18.04 in Docker container (on Ubuntu 18.04 host). Server mode with Local backend.

I'm listing this as low priority because it might be too specific to my use case.

When I specify final_workflow_outputs_dir in the workflowOptions file in my request to the API, the value is ignored and the output files remain in their default location relative to the execution path after the workflow execution succeeds.

For example, this is the results of a workflow with one File output named csvFile: workflowOptions file (I have tried this with and without "file://" with same results):

{
  "final_workflow_outputs_dir": "file:///data/external/workflow_1/4fd344c8-a228-421b-b561-9ed516e2316c",
  "use_relative_output_paths": true
}

Final outputs

{
 "cwl_temp_file_ad3d3e78-d6a6-421a-9111-86fdefe14b80.cwl.csvFile": "\"/cromwell-executions/cwl_temp_file_ad3d3e78-d6a6-421a-9111-86fdefe14b80.cwl/ad3d3e78-d6a6-421a-9111-86fdefe14b80/call-getdataframe/execution/glob-aae5e4d226234858387812bc5d30218c/217.csv\""
}

After the workflow successfully executes, the specified output directory remains empty.

Environment notes I'm using Cromwell in server mode in a Docker container as a service to be consumed by other Docker applications on the same host. The client applications communicate with the Cromwell container using the python requests library. The specified final_workflow_outputs_dir is located in a bind mount accessible from both containers at the same location (e.g. /data/external is the default directory "external" to the containers which is mounted on all containers at that location). I have a workaround with a workflow step that makes a request back to the client service, but this is not ideal because it requires the users to modify the workflows. The client software includes an Angular application for editing workflows using Rabix's cwl-svg.

Full workflow

$namespaces: {sbg: https://www.sevenbridges.com}
class: Workflow
cwlVersion: v1.0
doc: A test workflow to demonstrate the editor.
id: workflow1
inputs:
- {id: omics_url, sbg:x: -158.51063537597656, sbg:y: 29.940061569213867, type: string}
- {id: omics_auth_token, sbg:x: -214.89361572265625, sbg:y: 170.31314086914062, type: string}
- {id: collection_id, sbg:x: -152.0425567626953, sbg:y: 306.3538818359375, type: int}
label: Test Workflow
outputs:
- id: csvFile
  outputSource: [getdataframe/csvFile]
  sbg:x: 523.4833374023438
  sbg:y: 191.5
  type: File
requirements:
- {class: MultipleInputFeatureRequirement}
steps:
- id: getcollection
  in:
  - {id: collection_id, source: collection_id}
  - id: omics_url
    source: [omics_url, omics_url, omics_url, omics_url]
  - id: omics_auth_token
    source: [omics_auth_token, omics_auth_token, omics_auth_token, omics_auth_token]
  label: Get Collection
  out:
  - {id: collection_file}
  run:
    baseCommand: [getcollection.py]
    class: CommandLineTool
    cwlVersion: v1.0
    doc: Get a collection as an HDF5 file.
    id: getcollection
    inputs:
    - id: collection_id
      inputBinding: {position: 0}
      type: int
    - id: omics_url
      inputBinding: {position: 1}
      type: string
    - id: omics_auth_token
      inputBinding: {position: 2}
      type: string
    label: Get Collection
    outputs:
    - id: collection_file
      outputBinding: {glob: '*.h5'}
      type: File
  sbg:x: 32.978721618652344
  sbg:y: 166.41757202148438
- id: getdataframe
  in:
  - {id: inputFile, source: getcollection/collection_file}
  label: Get DataFrame
  out:
  - {id: csvFile}
  run:
    baseCommand: [getdataframe.py]
    class: CommandLineTool
    cwlVersion: v1.0
    doc: Get an Pandas DataFrame as a CSV file.
    id: getdataframe
    inputs:
    - doc: A collection.
      id: inputFile
      inputBinding: {position: 0}
      type: File
    - default: true
      doc: Whether the column names should be just the x value or Y_x
      id: numericColumns
      inputBinding: {position: 1}
      type: boolean
    - default: true
      doc: Whether to include label columns
      id: includeLabels
      inputBinding: {position: 2}
      type: boolean
    - default: false
      doc: Whether to only include label columns. Overrides includeLabels.
      id: includeOnlyLabels
      inputBinding: {position: 3}
      type: boolean
    label: Get DataFrame
    outputs:
    - doc: A CSV file containing a Pandas DataFrame.
      id: csvFile
      outputBinding: {glob: '*.csv'}
      type: File
  sbg:x: 340
  sbg:y: 190
svitkovsergey commented 5 years ago

Hi @dpfoose ! I've faced pretty same problem: I need to provide workflowOptions file to cromwell running in server mode, but I doubt this possible, because when you run wdl on cromwell in run mode you could pass this file via --options. e.g.:

java -jar cromwell-45.jar run --options myWorkfloOptions.json

However, the server mode does not seems to have --options argument and I have no idea how to pass this file in such case. Please, let me know if you figure out how to pass this options file to cromwell running in server mode.

geoffjentry commented 5 years ago

@likeanowl In server mode, it's just part of the API request.

svitkovsergey commented 5 years ago

@geoffjentry Very nice, thanks for the link! Wish I did know this earlier... :+1: Could this file then be provided to cromwell when running integration test via centaur?

geoffjentry commented 5 years ago

@likeanowl Yes - you just need to specify it in the centaur test description, with a pointer to where the option file lives

svitkovsergey commented 5 years ago

@geoffjentry thats great, many thanks!

dpfoose commented 5 years ago

@likeanowl, @geoffjentry

I've been using the workflowOptions parameter in the API request and it still doesn't work (I mentioned this is in the issue description, but it's kind of hidden). I'm not sure if it's ignoring workflowOptions in general or just final_workflow_outputs_dir

svitkovsergey commented 5 years ago

@dpfoose It could be related to https://github.com/broadinstitute/cromwell/issues/4982 then.

azzaea commented 4 years ago

I like to report that this is also an issue when running CWL scripts via cromwell in run mode too: the options argument is ignored.

$ java -jar ${crom} --version
cromwell 47
$
$ cat workflow.options.json
{
    "final_workflow_outputs_dir": "results.cromwell",
    "use_relative_output_paths": true
}
$
$ java -jar ${crom} run example.cwl -i inputs.yml --type cwl -o workflow.options.json
:
: # workflow runs normally, logs and other files in `cromwell-executions` folder as expected
: 
$ ls results.cromwell
$ # folder is empty
$
medcelerate commented 4 years ago

I can report we are having the same issue as well.

ghost commented 3 years ago

Same here. If you run in -t wdl is works but not for -t cwl. How does changing the type change the code for final_workflow_outputs_dir or is this completely removed from that engine?