common-workflow-language / cwl-v1.2

Released CWL v1.2.1 specification
https://www.commonwl.org/v1.2/
Apache License 2.0
34 stars 22 forks source link

Improve the definition of `source` for `WorkflowStepInput` #247

Open fmigneault opened 1 year ago

fmigneault commented 1 year ago

@mr-c I found this case: https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/1.2.1_proposed/tests/count-lines4-wf.cwl While working on https://github.com/opengeospatial/ogcapi-processes/pull/329

Which has:

in:
  file1: [file1, file2]

But the doc: https://www.commonwl.org/v1.2/Workflow.html#WorkflowStep Shows: in: array<WorkflowStepInput> | map<id, source | WorkflowStepInput>

While the outputSource: string | array<string> is explicitly defined as potentially a string/list of string in https://www.commonwl.org/v1.2/Workflow.html#WorkflowOutputParameter, the source does not show explicitly the array<string> variant.

There is this part of the text that identifies it as a possibility, but I find that the missing array in the definition makes its somewhat counter intuitive (and in fact, it was not a use case I covered until I got the error because I didn't notice it as something that was possible).

https://www.commonwl.org/v1.2/Workflow.html#Merging_multiple_inbound_data_links If the sink parameter is an array, or named in a workflow scatter operation, there may be multiple inbound data links listed in the source field. The values from the input links are merged depending on the method specified in the linkMerge field. If both linkMerge and pickValue are null or not specified, and there is more than one element in the source array, the default method is "merge_nested".

I propose to add the array variant explicitly to the table.

mr-c commented 1 year ago

The connection between WorkflowStep.in and WorkflowStepInput is defined at https://github.com/common-workflow-language/cwl-v1.2/blob/4a715dd348cd6fcb03f8002cc13048e5ce02c721/Workflow.yml#L574-L579

The source field is defined at https://github.com/common-workflow-language/cwl-v1.2/blob/4a715dd348cd6fcb03f8002cc13048e5ce02c721/Workflow.yml#L283-L293

(It is interesting that Sink is only used by WorkflowStepInput, @tetron is there any reason to not combine them directly? )

The definition of OutputSource is at https://github.com/common-workflow-language/cwl-v1.2/blob/4a715dd348cd6fcb03f8002cc13048e5ce02c721/Workflow.yml#L228-L241 And it is not involved with any abbreviated form.

If we replace source with string?, string[]? (the type definition of source) in the rendering of the in field then the reader won't have the conceptual meaning of the value.

Options could include

  1. Hyperlinking fields references to their definitions (having those anchors would be very useful, regardless). So both id and source would be hyperlinks to their entries under WorkflowStepInput.
  2. Hover text for field references showing the types (and maybe the label/doc for the field as well?)
  3. Listing the types as a parenthetical

    array\<WorkflowStepInput> | map<id (string), source (string?, string[]?) | WorkflowStepInput>

  4. Combining 1+2 or 1+3