common-workflow-language / cwl-v1.2

Released CWL v1.2.1 specification
https://www.commonwl.org/v1.2/
Apache License 2.0
33 stars 22 forks source link

Using json-schema/cwl.yaml to validate CWL files inside IDE is not working correctly. #278

Open alexiswl opened 7 months ago

alexiswl commented 7 months ago

Hello,

I was wondering if there is a way to use the json schema specified in cwl.yaml to auto-complete / lint my code before running cwltool validate.

I performed the following commands to get my json schema:

wget \
  --quiet \
  --output-document /dev/stdout \
  "https://raw.githubusercontent.com/common-workflow-language/cwl-v1.2/main/json-schema/cwl.yaml" | \
yq --output-format=json > cwl_mappings.json

In my JetBrains IDE, Under Languages & Frameworks -> Schemas and DTDs -> _JSON Schema Mappings, I configured the following:

image

After restarting my IDE, nothing changes.

Having a look at some other schema definitions (that do work in my IDE) such as the [GitHub Actions Workflow Json](https://[json.schemastore.org/github-workflow.json](https://json.schemastore.org/github-workflow.json), I noticed that this had many more higher level fields.
Notably the additionalProperties field was set to false.

I tried changing the additionalProperties and allOf field in the top level to the following:

allOf:
  - $ref: '#/$defs/CWL'
additionalProperties: false

so that the properties in the top level of the CWL file would need to match one of CWLWorkflow, CWLGraph or CWLAtomic

This worked! ...sort of.

It now thinks we're in the $graph version deployment.

image

This is likely caused because CWLGraphBase has additionalProperties: {}.

After updating all items of $defs that match type object to have additionalProperties: false and all items of $defs that match type array to have additionalItems: false.

Presto!

image

image

However, workflows still have issues for the class definition as it thinks it should be of the type 'CommandLineTool'

image

I'll keep tinkering around but would be great to see if this json schema could also be auto-generated.

mr-c commented 7 months ago

Tagging @fmigneault who hand wrote (!!) the existing json schema

Yes, it would be great if schema-salad or another tool could produce the JSON Schema version automatically!

fmigneault commented 7 months ago

I'm using an equivalent config in PyCharm with the reference to the remote YAML schema. I have not found a reliable way to validate all schemas at the same time. It seems JetBrain's IDEs are not handling additionalProperties: {} or additionalProperties: true properly, as if it is not able to rewind up in the schema hierarchy when a lower level requirement invalidates a specific definition in a oneOf. Disallowing additional properties seems to help, but I did not see anything in the spec that explicitly disallowed unknown fields.

alexiswl commented 7 months ago

Thanks @fmigneault I'm having the same issue using a conditional on the CWL attribute, I'll keep tinkering. I don't think it's a JetBrains issue though, I'm using an online json schema validator to compare and getting the same issue - https://www.jsonschemavalidator.net/

alexiswl commented 7 months ago

Three days in a rabbit hole! Very excited to see the daylight again!!

I've got the following working for me now!

https://github.com/common-workflow-lab/cwl-ts-auto/pull/27

fmigneault commented 7 months ago

@mr-c If I understand correctly, https://github.com/common-workflow-language/cwl-v1.2/blob/main/Workflow.yml is used with https://github.com/common-workflow-language/schema_salad to generate the definitions of https://github.com/common-workflow-lab/cwl-ts-auto?

Therefore, should the features proposed in https://github.com/common-workflow-lab/cwl-ts-auto/pull/27 be instead triggered by the CI of this repo, to invoke schema_salad and obtain the TS definitions, convert them to JSON-schema, validate the resulting JSON-schema against https://github.com/common-workflow-language/cwl-v1.2/blob/main/.github/workflows/ci.yml, and commit the updated version back to this repo? Not sure it makes much sense to have the JSON-schema live under https://github.com/common-workflow-lab/cwl-ts-auto.

alexiswl commented 7 months ago

I'm not sure how the cwl-ts-auto creation works, the repository's main branch hasn't been updated in a couple of years.

But happy to move it back to this repo instead. Would prefer to keep it as JSON as some of the IDE importers expect a url to a JSON file but can also convert back to YAML in addition for readability.

Would also be great to be able to register it with JSON Schema store as well

mr-c commented 7 months ago

@mr-c If I understand correctly, https://github.com/common-workflow-language/cwl-v1.2/blob/main/Workflow.yml is used with https://github.com/common-workflow-language/schema_salad to generate the definitions of https://github.com/common-workflow-lab/cwl-ts-auto?

Yes, but it is manually updated; we don't automatically regenerate when schema-salad and/or the CWL schema is updated. And we use https://github.com/common-workflow-language/cwl-v1.2/blob/main/CommonWorkflowLanguage.yml as the root schema

Therefore, should the features proposed in common-workflow-lab/cwl-ts-auto#27 be instead triggered by the CI of this repo, to invoke schema_salad and obtain the TS definitions, convert them to JSON-schema, validate the resulting JSON-schema against https://github.com/common-workflow-language/cwl-v1.2/blob/main/.github/workflows/ci.yml, and commit the updated version back to this repo? Not sure it makes much sense to have the JSON-schema live under https://github.com/common-workflow-lab/cwl-ts-auto.

I'm happy for the JSON-schema to live here, so it gets published to the website under https://w3id.org/cwl/ a.k.a. https://www.commonwl.org/v1.2/

However I would be uncomfortable about automatically updating it without some automated tests of the result.

mr-c commented 7 months ago

Would also be great to be able to register it with JSON Schema store as well

Cool, I didn't know about the JSON Schema Store. I would request that a URL like https://w3id.org/cwl/v1.2/cwl-schema.json (which would redirect to https://www.commonwl.org/v1.2/cwl-schema.json) or similar be registered once we are publishing the JSON Schema along with the existing v1.2 artifacts.

mr-c commented 7 months ago

The other option would be to teach schema-salad how to produce JSON schema directly, without going through the typescript generation→conversion→cleanup process. This will now be a lot easier as you would be able to compare the direct JSON schema serialization to the result of @alexiswl 's converter.

alexiswl commented 7 months ago

I'm happy for the JSON-schema to live here, so it gets published to the website under w3id.org/cwl a.k.a. commonwl.org/v1.2

Should that be a manual PR for now? Can copy over the outputs from the cwl-ts-auto GH actions workflow

fmigneault commented 7 months ago

The other option would be to teach schema-salad how to produce JSON schema directly

I agree this would be the best option. I didn't go through that path originally because I add a pre-made CWL JSON-schema equivalent, and it was faster to tweak it than implement schema-salad ⇾ JSON-schema from scratch, but it is what I would have done otherwise.

However I would be uncomfortable about automatically updating it without some automated tests of the result.

Once converted, it would be easy to validate the resulting JSON by passing the path here: https://github.com/common-workflow-language/cwl-v1.2/blob/b60a42e3cc417c5b75b88fd7c6681abcc7ff5b89/tests/json_schema/test_cwl_schema.py#L24 This will run it against all https://github.com/common-workflow-language/cwl-v1.2/blob/main/conformance_tests.yaml

alexiswl commented 7 months ago

Once converted, it would be easy to validate the resulting JSON by passing the path here:

Ah I missed this, built my own integration in GH actions instead here

fmigneault commented 5 months ago

@mr-c @alexiswl Has there been any update about publishing the JSON schema? Since the 1.2.1_proposed branch was merged, the $id and repos using this raw reference are now invalid.

https://github.com/common-workflow-language/cwl-v1.2/blob/b60a42e3cc417c5b75b88fd7c6681abcc7ff5b89/json-schema/cwl.yaml#L3

I would love to have an official: https://www.commonwl.org/v1.2/cwl-schema.json

alexiswl commented 5 months ago

@fmigneault currently using https://github.com/common-workflow-lab/cwl-ts-auto/blob/main/json_schemas/cwl_schema.yaml

Which is absorbed by https://github.com/SchemaStore/schemastore/blob/15c0a5141e10d0eb77f53db6d2222a5c35839575/src/api/json/catalog.json#L6146-L6151

alexiswl commented 5 months ago

I think an automatic PR to this repo whenever the json schema is updated in cwl-ts-auto could be feasible?

fmigneault commented 5 months ago

Maybe something can be configured with a push to gh-pages branch using https://github.com/peaceiris/actions-gh-pages? Usually, with a https://github.com/common-workflow-language/common-workflow-language.github.io, the https://github.com/common-workflow-language/cwl-v1.2 should be available as https://common-workflow-language.github.io/cwl-v1.2. However, https://github.com/common-workflow-language/cwl-website is also defined, and the site's structure is somewhat unusual.