common-workflow-language / cwltool

Common Workflow Language reference implementation
https://cwltool.readthedocs.io/
Apache License 2.0
336 stars 231 forks source link

Input `format` expression not considering `$namespaces` #2033

Open fmigneault opened 2 months ago

fmigneault commented 2 months ago

Expected Behavior

When using a file format with a JS expression (workaround for https://github.com/common-workflow-language/cwl-v1.3/issues/52), the format check should consider any relevant resolution of $namespaces beforehand.

Actual Behavior

The format check fails if the input file format and the evaluated format do not match exactly. Since $namespaces can be used to write equivalent formats (i.e.: https://www.iana.org/assignments/media-types/application/geo+jsoniana:application/geo+json), they should be considered interchangeably. However, the format expression fails unless the evaluated format is explicitly written in its long form (ie: the full URI).

Given that inputs submitted (in job.yml) either with the long-form URI or the namespace'd format are both converted to the long-form URI when reaching the below check, this forces the JS expression to use the long-form URI to be considered valid. https://github.com/common-workflow-language/cwltool/blob/6d8c2a41e2c524e8d746020cc91711ecc3418a23/cwltool/builder.py#L555-L559

However, for a user writing the CWL document that defined a $namespace section, it is very counter-intuitive to use the long-form URI only in the format expression, when everywhere else accepts iana:application/geo+json.

Workflow Code

job.yml

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "https://www.iana.org/assignments/media-types/application/geo+json"

OR

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "iana:application/geo+json"

echo_features.cwl

cwlVersion: "v1.2"
class: CommandLineTool
$namespaces:
  iana: "https://www.iana.org/assignments/media-types/"
baseCommand: echo
requirements:
  InlineJavascriptRequirement: {}
  DockerRequirement:
    dockerPull: "debian:stretch-slim"
inputs:
  features:
    type:
      - "File"
      - type: array
        items: File
    format: |
      ${
        if (Array.isArray(inputs.features)) {
          return "iana:application/geo+json";  # (!) here is the problematic format, unless the full URI is given
        }
        return "http://www.opengis.net/def/glossary/term/FeatureCollection";
      }
    inputBinding:
      valueFrom: |
        ${
          if (Array.isArray(inputs.features)) {
            return {
              "type": "FeatureCollection",
              "features": inputs.features.every(item => item.contents)
            };
          }
          return inputs.features.contents;
        }
outputs:
  features:
    type: File
    format: "http://www.opengis.net/def/glossary/term/FeatureCollection"
    outputBinding:
      glob: "features.json"
stdout: "features.json"

Full Traceback

❯ cwllog --debug echo_features.cwl job.yml
Running:  [cwltool --disable-color --debug echo_features.cwl job.yml 2>&1 | tee echo_features.log]
Log Path: [/tmp/echo_features.log]
INFO /home/francis/dev/conda/envs/weaver/bin/cwltool 3.1.20230906142556
INFO Resolved 'echo_features.cwl' to 'file:///tmp/echo_features.cwl'
URI prefix '${
  if (Array.isArray(inputs.features)) {
    return "iana' of '${
  if (Array.isArray(inputs.features)) {
    return "iana:application/geo+json";
  }
  return "http://www.opengis.net/def/glossary/term/FeatureCollection";
}
' not recognized, are you missing a $namespaces section?
URI prefix '${
  if (Array.isArray(inputs.features)) {
    return "iana' of '${
  if (Array.isArray(inputs.features)) {
    return "iana:application/geo+json";
  }
  return "http://www.opengis.net/def/glossary/term/FeatureCollection";
}
' not recognized, are you missing a $namespaces section?
echo_features.cwl:9:3: object id 'echo_features.cwl#features' previously defined
WARNING echo_features.cwl:22:7: JSHINT:       "features": inputs.features.every(item => item.contents)
echo_features.cwl:22:7: JSHINT:                                              ^
echo_features.cwl:22:7: JSHINT: W119: 'arrow function syntax (=>)' is only available in ES6. CWL only supports ES5.1
ERROR Workflow error:
Expected value of 'features' to have format '${\n  if (Array.isArray(inputs.features)) {\n    return "iana:application/geo+json";\n  }\n  return "http://www.opengis.net/def/glossary/term/FeatureCollection";\n}\n' but
 File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 554, in bind_input
    check_format(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwl_utils/file_formats.py", line 70, in check_format
    raise ValidationException(
schema_salad.exceptions.ValidationException: File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/main.py", line 1298, in main
    (out, status) = real_executor(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 62, in __call__
    return self.execute(process, job_order_object, runtime_context, logger)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 145, in execute
    self.run_jobs(process, job_order_object, logger, runtime_context)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 218, in run_jobs
    for job in jobiter:
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 963, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/process.py", line 888, in _init_job
    builder.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 330, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 262, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 560, in bind_input
    raise WorkflowException(
cwltool.errors.WorkflowException: Expected value of 'features' to have format '${\n  if (Array.isArray(inputs.features)) {\n    return "iana:application/geo+json";\n  }\n  return "http://www.opengis.net/def/glossary/term/FeatureCollection";\n}\n' but
 File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

Your Environment

fmigneault commented 2 months ago

Error on my end.

If the format is updated with the expected structure from the script, all format values work as expected and interchangeably.

single "FeatureCollection" File

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "ogc-term:FeatureCollection"

OR

features:
  class: File
  path: /tmp/feature-0.geojson
  format: "http://www.opengis.net/def/glossary/term/FeatureCollection"

array of "feature" Files

features:
  - class: File
    path: /tmp/feature-0.geojson
    format: "iana:application/geo+json"

OR

features:
  - class: File
    path: /tmp/feature-0.geojson
    format: "https://www.iana.org/assignments/media-types/application/geo+json"
fmigneault commented 2 months ago

Further investigation reveals that this is actually still an issue.

More specifically, if the input is defined with the following, everything works (everything, as in, whether the full URI or namespace variant are used in job.yml, the job succeeds).

inputs:
  features:
    format: |
      ${
        if (Array.isArray(inputs.features)) {
          return "https://www.iana.org/assignments/media-types/application/geo+json"; 
        }
        return "http://www.opengis.net/def/glossary/term/FeatureCollection";
      }

However, if using the namespace format inside the CWL expression, as below, the job always fails, no matter which format variant is provided in job.yml.

inputs:
  features:
    format: |
      ${
        if (Array.isArray(inputs.features)) {
          return "iana:application/geo+json";
        }
        return "ogc-term:FeatureCollection";
      }

With the namespace format in the CWL expression, 2 errors happen according to the format provided in job.yml.

  1. The job format is also the namespaced value (trying to do an == match with the evaluated CWL expression). The error is simply the generic schema_salad.exceptions.ValidationException: File has an incompatible format.

  2. The job format is the full URI. This causes a parsing error (looking for some name key?) with the following traceback.

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 554, in bind_input
    check_format(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwl_utils/file_formats.py", line 70, in check_format
    raise ValidationException(
schema_salad.exceptions.ValidationException: File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 218, in run_jobs
    for job in jobiter:
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 963, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/process.py", line 888, in _init_job
    builder.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 330, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 262, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 288, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 355, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 561, in bind_input
    f"Expected value of {schema['name']!r} to have "
KeyError: 'name'
ERROR Workflow error:
'name'
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 554, in bind_input
    check_format(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwl_utils/file_formats.py", line 70, in check_format
    raise ValidationException(
schema_salad.exceptions.ValidationException: File has an incompatible format: {
    "class": "File",
    "format": "https://www.iana.org/assignments/media-types/application/geo+json",
    "location": "file:///tmp/feature-0.geojson",
    "size": 162,
    "basename": "feature-0.geojson",
    "nameroot": "feature-0",
    "nameext": ".geojson"
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 218, in run_jobs
    for job in jobiter:
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/command_line_tool.py", line 963, in job
    builder = self._init_job(job_order, runtimeContext)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/process.py", line 888, in _init_job
    builder.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 330, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 262, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 288, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 355, in bind_input
    self.bind_input(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/builder.py", line 561, in bind_input
    f"Expected value of {schema['name']!r} to have "
KeyError: 'name'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/main.py", line 1298, in main
    (out, status) = real_executor(
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 62, in __call__
    return self.execute(process, job_order_object, runtime_context, logger)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 145, in execute
    self.run_jobs(process, job_order_object, logger, runtime_context)
  File "/home/francis/dev/conda/envs/weaver-py310/lib/python3.10/site-packages/cwltool/executors.py", line 252, in run_jobs
    raise WorkflowException(str(err)) from err
cwltool.errors.WorkflowException: 'name'