common-workflow-language / cwl-v1.2

Released CWL v1.2.1 specification
https://www.commonwl.org/v1.2/
Apache License 2.0
34 stars 22 forks source link

Clarify type system #78

Open multimeric opened 3 years ago

multimeric commented 3 years ago

There are a few topics that you need to read to get your head around the CWL type system. Many of these are located in separate parts of the specification, which do not link to each other. These topics include:

I would like to suggest refactoring this in some way. My preference would be to add a 2.x subsection (part of the section 2: Data Model), called something like "Type System", which explains all of these concepts holistically, and in one place, and this section is then linked to by the various other parts of the spec that demand it.

Failing that, these sections should at least have a link to each other to facilitate an understanding of the whole type system.

multimeric commented 3 years ago

I also wonder if, as part of this, we could clarify if you can have an optional array, and if this makes sense. e.g. does string?[] work? I think we should just codify what Schema Salad currently does, I don't think it hugely matters one way or the other.

mr-c commented 3 years ago

I also wonder if, as part of this, we could clarify if you can have an optional array, and if this makes sense. e.g. does string?[] work? I think we should just codify what Schema Salad currently does, I don't think it hugely matters one way or the other.

My guess would be that string?[] isn't valid, but that needs testing. I don't quite know your intent here: do you want to accept either a single string or an array who items (if any) are string?

If so, I'd use the following

type:
  - string
  - type: array
    items: string

Or do you want an array who items are strings or nulls?

type:
  - type: array
    items: [ string, 'null' ]
mr-c commented 3 years ago

or do you want an optional array-of-strings?

type: string[]?  # this I do know to work

# same as

type:
  - 'null'
  - type: array
    items: string
multimeric commented 3 years ago

Well my own usage isn't so much the point here, I just think it's worth codifying if these shortcuts work together and what they would mean.

tom-tan commented 3 years ago

Related?: common-workflow-language/common-workflow-language#605

multimeric commented 3 years ago

Ah yes. Although your questions were answered in the issue, it looks like the clarifications never reached the spec, but I think it would be helpful if they did.

mr-c commented 3 years ago

I support better documentation in this area, yes! PRs with concrete proposals will be viewed favourably for inclusion in CWL v1.2.1.

The documentation is put together by https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/makedoc.py (also known as schema-salad-doc == schema-salad-tool --print-doc) following the mechanisms documented in https://www.commonwl.org/v1.2/SchemaSalad.html#Documentation (and elsewhere in that same document)

Here are our levers to adjust how the separate sections (some of which are just text, others are rendered schema and text) are ordered:

docParent

Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under docParent.

CWLType is set as the docParent of the File documentation, though File ends up being after CWLType but not as as sub-section. Possible bug in `makedoc.py``?

docChild

Hint to indicate that during documentation generation, documentation for docChild should appear in a subsection under this type.

CWLType (the current section 5.1.6) is set as the child of the BaseTypesDoc

But BaseTypesDoc is never referenced anywhere else, so CWLType gets put in after CommandInputParameter as that is the first place in CommandLineTool.yml to mention CWLType

finally we also have docAfter

Hint to indicate that during documentation generation, documentation for this type should appear after the docAfter section at the same level.

Back to your request to reorganize things:

There are a few topics that you need to read to get your head around the CWL type system. Many of these are located in separate parts of the specification, which do not link to each other. These topics include:

  • The list of CWL type symbols (5.1.6). This is located inside section 5.1 CommandInputParameter, even though it's also relevant to outputs (section 5.2) amongst others
  • The question mark notation e.g. string?, and the bracket notation, e.g. string[] (2.4)
  • The notion that "<T> or null" conceptually means "this value is optional". This is described broadly in the inputs field of section 5, but not explicitly

I would like to suggest refactoring this in some way. My preference would be to add a 2.x subsection (part of the section 2: Data Model), called something like "Type System", which explains all of these concepts holistically, and in one place, and this section is then linked to by the various other parts of the spec that demand it.

The beginning of the CWL Workflow and CommandLineTool HTML pages is defined by type: documentation node named WorkflowDoc and CommandLineToolDoc respectively. These *Doc sections contain markdown for the title, abstract, and changelog and include other common text shared between the two standards: contrib.md , intro.md, concepts.md (sections 1.4 - 3.7). CommandLineToolDoc also $includes invocation.md after concepts.md

Currently these type: documentation nodes in the CWL schema files can only contain pure prose markdown (both directly and via $include) and they can not have rendered schema parts like CWLType inside them, only before or after them.

So to include CWLType as a subsection or sub-sub-section of "2. Data model" we need to split up concepts.md at the desired insertion point, add docChild: CWLType to both WorkflowDoc and CommandLineToolDoc and add a WorkflowDoc2 and CommandLineToolDoc2 (both of type: documentation) to $include the 2nd half of concepts.md (concepts2.md?)

FYI: The docs are officially rendered into HTML via https://github.com/common-workflow-language/cwl-website/blob/main/website.sh which is kinda complicated to set up using local edits (or even a branch) especially for a casual check. Writing up instructions on how to do that more simply would be very appreciated.