ga4gh / workflow-execution-service-schemas

The WES API is a standard way to run and manage portable workflows.
Apache License 2.0
82 stars 38 forks source link

feat: add in support for workflow engine and version when submitting a request. Closes #181 #182

Closed vsmalladi closed 1 year ago

vsmalladi commented 1 year ago

feat: add in support for workflow engine and version when submitting a request. Closes #181

uniqueg commented 1 year ago

Thanks, this addresses -to some extent- an issue we were facing when implementing a WES for Snakemake - which doesn't have a language specification (at least not a versioned one, afaik). Which means that you rather need to specify an engine version. Our workaround is to use the engine version as the workflow_type_version, which is not ideal, for example because for engines it would make sense to specify ranges of versions that a workflow is known to play nice with.

Perhaps that is something that could still go into your PR though?

But then it still wouldn't fully address the issue unless the specs were also to specify that anyOf workflow_engine_version and workflow_type_version should be required (rather than only workflow_type_version). But that would likely be a breaking change...

Anyway, with support at least for an array of versions (even better would be a syntax like >=3.2 <=4.0), I think this is an important step in the right direction.

vsmalladi commented 1 year ago

@uniqueg Yes I think a range of versions makes sense.

As for the 'anyOf' I am open to that but not sure what that would ultimatley break. We could make 'None' option for language version for snakemake. What do you think?

uniqueg commented 1 year ago

I suppose for anyOf it would have to be either workflow_type_version or both of workflow_engine and workflow_engine_version. But I agree with you that it might break things, even though it may actually be backwards compatible.

Actually, come to think of it, I think the problem that providing a workflow engine version should require providing a workflow engine exists for the current PR as well. So perhaps it would require defining a schema WorkflowEngineVersion or some such (isn't there something like that in the Service Info specs for WES?) that then requires the engine name but has the version as optional.

vsmalladi commented 1 year ago

Yes the @uniqueg expanded Workflow engine version. Take a look.

cjllanwarne commented 1 year ago

From the text of #181: I'm not sure I like this being a required field. Whether or not it's required probably depends a lot on the specific WES environment.

For example: I anticipate continuous updating our engines behind the scenes and expect that people should always get the latest version of the engine no matter what.

vsmalladi commented 1 year ago

@cjllanwarne I think updating to have the latest version of engines makes sense. However I see two issues:

  1. Many people like Clinical labs will want to pin the workflow engine version down so that the pipeline is fully reproducible
  2. Certain pipelines will use syntax that is deprecated in newer versions and will either require different workflow options to run or will require an old version of the workflow engine to run. Example is Nextflow is deprecated DSL-1 support going forward and you will have to specificy a certain version of Nextflow to run these older workflow versions.
uniqueg commented 1 year ago

The PR in its current form looks good to me, with both workflow_engine and workflow_engine_version added as optional params.

The only minor problems I see are:

uniqueg commented 1 year ago

Oh, one more thing: given that the requestBody of the POST /runs endpoint does not make use of the RunRequest schema, I'm afraid that the workflow_engine and workflow_engine_version parameters need to be added there as well (see specs):

    requestBody:
      content:
        multipart/form-data:
          encoding: {}
          schema:
            type: object
            properties:
              workflow_params:
                type: string
              workflow_type:
                type: string
              workflow_type_version:
                type: string
              tags:
                type: string
              workflow_engine_parameters:
                type: string
              workflow_url:
                type: string
              workflow_attachment:
                type: array
                items:
                  type: string
                  format: binary
                description: ''
      required: false

And the description would have to be amended, too:

    description: >-
      This endpoint creates a new workflow run and returns a `RunId` to monitor its progress.
      The `workflow_attachment` array may be used to upload files that are required to execute the workflow, including the primary workflow, tools imported by the workflow, other files referenced by the workflow, or files which are part of the input.  The implementation should stage these files to a temporary directory and execute the workflow from there. These parts must have a Content-Disposition header with a "filename" provided for each part.  Filenames may include subdirectories, but must not include references to parent directories with '..' -- implementations should guard against maliciously constructed filenames.
      The `workflow_url` is either an absolute URL to a workflow file that is accessible by the WES endpoint, or a relative URL corresponding to one of the files attached using `workflow_attachment`.
      The `workflow_params` JSON object specifies input parameters, such as input files.  The exact format of the JSON object depends on the conventions of the workflow language being used.  Input files should either be absolute URLs, or relative URLs corresponding to files uploaded using `workflow_attachment`.  The WES endpoint must understand and be able to access URLs supplied in the input.  This is implementation specific.
      The `workflow_type` is the type of workflow language and must be "CWL" or "WDL" currently (or another alternative  supported by this WES instance).
      The `workflow_type_version` is the version of the workflow language submitted and must be one supported by this WES instance.
      See the `RunRequest` documentation for details about other fields.
vsmalladi commented 1 year ago

Update the PR for this.

vsmalladi commented 1 year ago

@uniqueg updated the PR to address your comments.

wleepang commented 1 year ago

I'm ok with workflow_engine and workflow_engine_version being optional in all cases - i.e. the RunWorkflow request and the /runs/{id} response which includes a copy of the RunWorkflow request.

I recognize these parameters are needed for workflow definition languages that are intimately tied to their engine versions. That said, I think we should be careful with the implications for WES implementations that do not wish to expose their underlying infrastructural architecture due to either security concerns or IP restrictions.

vsmalladi commented 1 year ago

Update branch to move since the original PR was outdated.

patmagee commented 1 year ago

There does not appear to be any one opposed to merging this into the spec as is, so I am considering this change accepted. Final acceptance of this feature will be done during review of the next release