common-workflow-language / cwl-utils

Python utilities for CWL
https://cwl-utils.readthedocs.io/
Apache License 2.0
36 stars 18 forks source link

schema_salad.exceptions.ValidationExceptions parsing certain data structures #31

Open leipzig opened 4 years ago

leipzig commented 4 years ago
import cwl_utils.parser_v1_2 as parser
parser.load_document("https://raw.githubusercontent.com/common-workflow-library/bio-cwl-tools/release/picard/picard_CreateSequenceDictionary.cwl")
  InitialWorkDirRequirement:
    listing:
      - $(inputs.REFERENCE)
outputs:
  ...
  sequences_with_dictionary:
    type: File
    format: edam:format_2573  # SAM
    secondaryFiles:
      - .dict
      - .fai
parser.load_document("https://raw.githubusercontent.com/common-workflow-library/bio-cwl-tools/release/bamtools/bamtools_stats.cwl")
bamtools_stats.cwl:110:5:  the `outputBinding` field is not valid...
the `outputEval` field is not valid...
leipzig commented 4 years ago

secondaryFiles within outputs is not terribly common but listing should accept that expression https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/tests/test_schema/CommandLineTool.yml#L776-L781

stain commented 4 years ago

Probably not related, but you are testing with a v1.2 parser even though document declares cwlVersion: 1.2

tetron commented 4 years ago

@leipzig so what's actually happening here is that codegen was never updated for the schema salad v1.1, which added a few features that are used for CWL 1.1 and 1.2. Specifically:

  1. secondaryFilesDSL
  2. Special treatment of the Expression type
  3. subscopes (used for assigning identifiers)
  4. default values

The first two are causing the problems you're reporting. Some suggestions:

  1. You need to add a case for secondaryFilesDSL, see https://github.com/common-workflow-language/schema_salad/blob/main/schema_salad/codegen.py#L113 and https://www.commonwl.org/v1.1/SchemaSalad.html#Domain_Specific_Language_for_secondary_files
  2. You need to add a special case for the Expression type https://github.com/common-workflow-language/schema_salad/blob/af9842edc3e55723bfd0c50562ab44c7daa35226/schema_salad/python_codegen.py#L285 and add an _ExpressionLoader which should pretty much just check if the value is a string
  3. declare_field https://github.com/common-workflow-language/schema_salad/blob/af9842edc3e55723bfd0c50562ab44c7daa35226/schema_salad/python_codegen.py#L368 needs an extra subscope parameter, if the subscope is non-empty, then it gets appended to baseuri when loading the field
  4. begin_class should be extended to so that fields with default or optional are sorted to the end of the __init__ parameters and assigned None, and then in the body of __init__ any fields with default that are None should be assigned the default value.
tetron commented 4 years ago

What you should do is make a PR against schema-salad that fixes these issues, and a PR against cwl-utils with updated v1.1 and v1.2 parsers and test cases of previously-failing v1.1 and v1.2 files.

leipzig commented 4 years ago

@tetron I am a bit confused about number 3. Does it apply only to declare_id_field or declare_field as well. It seems the subscope argument is something that appears as when jsonldPredicate is a list, so it would be more relevant to declare_field

      jsonldPredicate:
        _id: "cwl:run"
        _type: "@id"
        subscope: run

but that when it's jsonldPredicate: "@id" we are talking about workflow id

tetron commented 4 years ago

@leipzig When the predicate is @id that means that field is the identifier for the object it appears in.

A field with _type: "@id" means it is a reference to an identifier.

A subscope means that jsonldPredicate: "@id" fields get an extra path element to avoid identifier name conflicts with the parent. This is done by adding the subscope to the baseuri that is used to construct identifiers in the nested objects.

Does that help?

leipzig commented 4 years ago

Should I just concatenate baseuri+subscope?

tetron commented 4 years ago

Pretty much. Here's what ref_resolver.py does

subscope = ""  # type: str
if key in loader.subscopes:
    subscope = "/" + loader.subscopes[key]
document[key], _ = loader.resolve_all(
    val, base_url + subscope, file_base=file_base, checklinks=False
)
tetron commented 4 years ago

(the important part is that it also adds a slash)

mr-c commented 3 years ago

@leipzig Are you still working on this? Is there a branch of what you've tried already?

leipzig commented 3 years ago

@mr-c i have some progress on this branch https://github.com/leipzig/schema_salad/tree/main can you point me to a 1.2 example on hand with all of these features?

mr-c commented 3 years ago

@leipzig Huzzah!

Here are some CWL v1.2 examples. courtesy @jdidion

https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-sd-secondaryFiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/anon_enum_inside_array_inside_schemadef.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array_basename_and_entryname.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/docker-array-secondaryfiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/optional-output.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-in-secondaryFiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/initialworkdirrequirement-docker-out.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/secondaryfiles/rename-outputs.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_file_array_basename.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/stage_array.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/initialworkdir-glob-fullpath.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/record-out-secondaryFiles.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/secondaryfiles/rename-inputs.yml https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/params.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/params2.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/schemadef-tool.cwl https://github.com/common-workflow-language/cwl-v1.2/blob/main/tests/imported-hint.cwl