common-workflow-language / cwl-v1.2

Released CWL v1.2.1 specification
https://www.commonwl.org/v1.2/
Apache License 2.0
34 stars 22 forks source link

Spec conflict?: `LoadListingRequirement` and `Directory.listing` #75

Open tom-tan opened 3 years ago

tom-tan commented 3 years ago

The spec of LoadListingEnum used in LoadListingRequirement says:

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

On the other hand, the spec of Directory says that:

If listing is not provided, the implementation must have some way of fetching the Directory listing at runtime based on the location field.

I am not sure I can say it a "conflict" but at least CWL v1.0, that has the same description of listing, implicitly requires the platforms loading the Directory listing recursively.

It would be nice if the section of Directory mentions loadListing and LoadListingRequirement.

mr-c commented 3 years ago

Thank you @tom-tan for spotting this. I would positively view a PR to https://github.com/common-workflow-language/cwl-v1.2/blob/main/Process.yml#L426 to do as you suggest.

tom-tan commented 3 years ago

OK, I will send a request later.

Can I clarify the spec about the listing to do that? The spec says it is optional and its type is array<File | Directory>. Also, the type field of listing in the schema is (of course it is the same as the website of the spec):

    - name: listing
      type:
        - "null"
        - type: array
          items: [File, Directory]

I expect the listing field with null value in the case of no_listing but the conformance test #243 requires it should be undefined. Does it have different interpretation from null or can we fix the tests by replacing such undefined with null?

mr-c commented 3 years ago

@tom-tan That's a good point:

What does each of the following mean for a class: Directory?

  1. listing: null
  2. listing: []
  3. and no listing at all

The test at https://github.com/common-workflow-language/cwl-v1.2/blob/f3ec4d1d04f5ccb7983f7d1fdd2e08ec96c69bc6/tests/listing_none2.cwl#L14 tests for the 3rd case (as that is how to detect a missing property in Javacsript) ; that is to say, it tests for an unspecified directory listing.

To me, case 1 and case 2 both mean that this Directory is empty. You are asking if case 1 (listing is present, but the value is an explicit null) should have the same meaning as case 3 (no listing at all): that the Directory is not necessarily empty, it is just unspecified.

tom-tan commented 3 years ago

Can I clarify the term "unspecified"? Which is what you are intended in the comment?

  1. listing property returns a platform-dependent value,
  2. it behaves as listing property does not provided, or
  3. it behaves as listing property has undefined value.
tom-tan commented 3 years ago

By the way,

To me, case 1 and case 2 both mean that this Directory is empty.

It makes me confused. For example, an input object param1: null is distinct from param1: [] in CWL spec (even if the resulted command line options are same). I prefer that listing: null and listing: [] have distinct interpretations because it makes easier to implement platforms especially with statically typed languages.

mr-c commented 3 years ago

Earlier I said

The test at https://github.com/common-workflow-language/cwl-v1.2/blob/f3ec4d1d04f5ccb7983f7d1fdd2e08ec96c69bc6/tests/listing_none2.cwl#L14 tests for the 3rd case (as that is how to detect a missing property in Javacsript) ; that is to say, it tests for an unspecified directory listing.

@tom-tan asks

Can I clarify the term "unspecified"? Which is what you are intended in the comment?

The CWL conformance test referenced above checks that the Directory object lacks (does not have) a property named "listing" as loadListing: no_listing is written under LoadListingRequirement.

So we should clarify https://www.commonwl.org/v1.2/Workflow.html#LoadListingEnum to say

symbol description
no_listing Do not include a "listing" key in "Directory" objects.
shallow_listing Include a "listing" key in "Directory" objects, but only include one level of contents; do not recurse into subdirectories.
deep_listing Include a "listing" key in "Directory" objects and recursively load all subdirectories as well.
mr-c commented 3 years ago

By the way,

To me, case 1 and case 2 both mean that this Directory is empty.

It makes me confused. For example, an input object param1: null is distinct from param1: [] in CWL spec (even if the resulted command line options are same). I prefer that listing: null and listing: [] have distinct interpretations because it makes easier to implement platforms especially with statically typed languages.

We need explicit meanings for all three. I think we have good meanings for Case 2 (listing: [], this Directory is empty) and Case 3 (no listing attribute due to loadListing: no_listing, the contents of this Directory object are not available to CWL parameter references nor to CWL expressions).

What meaning would you ascribe to Case 1 (listing: null, the listing property is present on the Directory object and its value is a literal JSON/YAML null)?

I can't think of another useful meaning for Case 1. We can't forbid it in CWL v1.2, as null is explicitly allowed: https://github.com/common-workflow-language/cwl-v1.2/blob/e65a2de29828799d8ab29886bf7964655500e8f4/Process.yml#L422

mr-c commented 3 years ago

On the other hand, the spec of Directory says that:

If listing is not provided, the implementation must have some way of fetching the Directory listing at runtime based on the location field.

I am not sure I can say it a "conflict" but at least CWL v1.0, that has the same description of listing, implicitly requires the platforms loading the Directory listing recursively.

It would be nice if the section of Directory mentions loadListing and LoadListingRequirement.

Agreed, there should be a mention in https://github.com/common-workflow-language/cwl-v1.2/blob/e65a2de29828799d8ab29886bf7964655500e8f4/Process.yml#L425 about loadListing, the meaning of a null value versus leaving out the listing field entirely.

For example, if when loadListing: no_listing and a CWL expression adds a listing to a received class: Directory object and returns that modified object, then a new class: Directory needs to be made (perhaps at a new location) with the new listing contents, instead of the previous contents which were hidden.

This would make a good conformance test, I think.

tom-tan commented 3 years ago

What meaning would you ascribe to Case 1 (listing: null, the listing property is present on the Directory object and its value is a literal JSON/YAML null)?

I interpreted the following sentences in the spec

Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

and

Do not load the directory listing.

as: listing field is loaded but its contents are not loaded (Important note: for statically typed programmer, it is hard to guess omitting the field itself unless it is clarified).

Also I misunderstood that the expression $(inputs.d.listing === undefined) represents "inputs.d.listing has undefined value" because I am not familiar with the detail of JavaScript. IMO it is hard for non-JavaScript programmers to guess this behavior because it seems to compare the value rather than to check the existence of the field.


I can't think of another useful meaning for Case 1. We can't forbid it in CWL v1.2, as null is explicitly allowed:

Unfortunately it is hard to fix it, I guess. The SALAD (and CWL spec that depends on SALAD) uses the null type in the following distinct meanings:

As a result of that, schema-salad-tool passes the following CWL document:

#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
requirements:
  LoadListingRequirement:
    loadListing: no_listing
  InlineJavascriptRequirement: {}
inputs:
  d: Directory
outputs:
  out:
    type: boolean
    outputBinding:
      outputEval: $(inputs.d.listing === undefined)
baseCommand: "true"
hints: null   # !!!!????
$ schema-salad-tool CommonWorkflowLanguage.yml tests/listing_none2.cwl 
/usr/local/bin/schema-salad-tool Current version: 7.0.20201119201711
Document `tests/listing_none2.cwl` is valid

I'm not sure it is a limitation of SALAD itself or it comes from JSON-LD or Avro.

tom-tan commented 3 years ago

IMO treating listing: null as "listing field is missing" is reasonable in the current situation because it needs less implementation changes (SALAD already do that as explained above) and is consistent with other parts of the spec of CWL such as omitting rules for input and output parameters.

mr-c commented 3 years ago

@tetron Do you think that listing: null is equivalent to "listing field is not present"?

tetron commented 3 years ago

I agree. We've been trying to avoid the distinction between "null" and "undefined" in the data model, so things that are "undefined" are considered "null". That's also a much better policy for code generation where the fields already exist but can have a "null" value.

mr-c commented 3 years ago

I agree. We've been trying to avoid the distinction between "null" and "undefined" in the data model, so things that are "undefined" are considered "null". That's also a much better policy for code generation where the fields already exist but can have a "null" value.

Is that a CWL v1.2.1 clarification or a CWL v1.3/2.0 change?

tetron commented 3 years ago

I would call it a clarification, I'm trying to find the reference (assuming we wrote it down) that fields not specified in input objects get turned into null.

mr-c commented 3 years ago

@tetron @tom-tan Upon review of all the optional fields and references to null it appears to be safe to state that fieldname: null is the same as that fieldname not being specified (and vice-versa). I made a separate issue to track that: https://github.com/common-workflow-language/cwl-v1.2/issues/87

bogdang989 commented 3 years ago

On this note, there seems to be a conflict in conformance test 86 if I am not mistaken. The default behaviour for loadListing is no_listing (https://www.commonwl.org/v1.2/CommandLineTool.html#CommandOutputBinding), so the listing property should not be present at all. The test 86 (https://github.com/common-workflow-language/cwl-v1.2/blob/main/conformance_tests.yaml#L1178) globs a directory and expected output contains the listing property

  output:
    "outdir": {
        "class": "Directory",
        "listing": [
            {
              "class": "File",
              "location": "goodbye.txt",
                "checksum": "sha1$dd0a4c4c49ba43004d6611771972b6cf969c1c01",
                "size": 24
            },
            {
              "class": "File",
              "location": "hello.txt",
                "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b",
                "size": 13
            }
        ],
    }

so the test fails if listing is missing, but it should be missing because the output is defined as

outputs:
  outdir:
    type: Directory
    outputBinding:
      glob: .

We can simply add load_listing: shallow_listing to the test to handle this, but on the topic of this thread it confuses me how this test passes anywhere if loadListing is implemented this way, am I missing something?

azat-badretdin commented 3 years ago

The default behaviour for loadListing is no_listing

Looks like this default behavior changed between v1.0 and v1.2 of CWL specs, which I found experimentally recently while transitioning our 300 CWL files from v1.0 to v1.2.

I see in https://www.commonwl.org/v1.1/Workflow.html#Changelog

Added LoadListingRequirement and loadListing to control whether and how Directory listings should be loaded for use in expressions.

but it is not clear that the default changed.

Thanks for adding conditional execution in 1.2!

mr-c commented 3 years ago

while transitioning our 300 CWL files from v1.0 to v1.2

Interesting! Any particular reason you did a mass upgrade? Did you use https://github.com/common-workflow-language/cwl-upgrader/ ?

mr-c commented 3 years ago

FYI, the cwl-upgrader does set loadListing: deep_listing when upgrading to CWL v1.2 if there was no cwltool#LoadListingRequirement extension already used in the pre v1.2 document.

azat-badretdin commented 3 years ago

Thank you for your reply, Michael.

Any particular reason you did a mass upgrade?

Conditional execution. We did not use the upgrader. Didn't know it existed (our fault).

extension already used in the pre v1.2 document

Right. It was mentioned first in v1.1, AFAIK.

Thank you for letting us know about the existence of the upgrader, beats perl -i -pe 's{^cwlVersion:\s+1.0}{cwlVersion: 1.2}g' :-) .

We are looking forward to using it when CWL 1.3 will come (and 2.0, and 3.0 and further on).

Azat

PS. When I was thinking about conditional execution, I was thinking about:

run: Expression
runChoices: [flow.cwl, flow2.cwl]

where "to do or not to do" are equivalent workflows, but that would lead potentially to combinatorial explosion over runChoices in multiple condition points in code during validation.

Anyway, the ship has sailed.

mr-c commented 3 years ago

Conditional execution.

Cool! For reference, you can mix v1.0 and v1.1, and v1.2 CommandLineTools, ExpressionTools, and sub-Workflows in a v1.2 Workflow. We even have tests for version mixing as part of the CWL v1.2 conformance tests.

Didn't know [that cwl-upgrader] existed

No worries. I updated the description on https://www.commonwl.org to make it more obvious. I've also opened an issue to mention cwl-upgrader and explain version mixing as part of the upcoming CWL v1.2.1 update

Right. It was mentioned first in v1.1, AFAIK.

Yep, I mispoke; I even linked to the v1.0 to v1.1 code without noticing it :-)

We are looking forward to using it when CWL 1.3 will come (and 2.0, and 3.0 and further on).

Huzzah :-)

our 300 CWL files

Are these public? It would be great to have the CommandLineTools donated to https://github.com/common-workflow-library/bio-cwl-tools

azat-badretdin commented 3 years ago

For reference, you can mix

Good to know for a stopgap measure, but my practice is to avoid stopgaps as much as possible.

It would be great to have the CommandLineTools donated

Sure. The caveat is that they are heavily ASN.1-centric tools. ASN.1 is our go-to format of data internally for things related to biomolecules and other data. Unfortunately, this format is not very popular outside of NCBI.

People can still try to rework the workflows within our docker containers to build on PGAP by mounting their own CWL directories. For example, in my current addition of a new workflow to internal (in internal workflow format) and external (CWL) PGAP, 80% of tools were already there. As usual, remaining 20% is a bottleneck, especially to people unfamiliar with NCBI toolkit.

mr-c commented 3 years ago

but my practice is to avoid stopgaps as much as possible.

:+1:

The caveat is that they are heavily ASN.1-centric tools.

Sure, we are still happy to have them!

adamnovak commented 3 years ago

I think there might be a similar conflict in 1.2 conformance test ID 131.

The tool asks for a Directory input with no load_listing key, and has no LoadListingRequirement itself, meaning that by default it should see no listing, right?

https://github.com/common-workflow-language/cwl-v1.2/blob/5f27e234b4ca88ed1280dedf9e3391a01de12912/tests/dynresreq-dir.cwl#L11-L12

But then in its requirements section it accesses the listing anyway:

https://github.com/common-workflow-language/cwl-v1.2/blob/5f27e234b4ca88ed1280dedf9e3391a01de12912/tests/dynresreq-dir.cwl#L5-L9

The test happens to run it with a Directory literal that specifies a listing, and I think cwltool might just pass that listing along:

https://github.com/common-workflow-language/cwl-v1.2/blob/5f27e234b4ca88ed1280dedf9e3391a01de12912/tests/dynresreq-dir-job.yaml#L1-L6

azat-badretdin commented 3 years ago

Are these public?

It looks like I did not post the obvious answer to this. Sorry about that.

Yes, they are posted here at Github: https://github.com/ncbi/pgap

mr-c commented 3 years ago

On this note, there seems to be a conflict in conformance test 86 if I am not mistaken. The default behaviour for loadListing is no_listing (https://www.commonwl.org/v1.2/CommandLineTool.html#CommandOutputBinding), so the listing property should not be present at all. The test 86 (https://github.com/common-workflow-language/cwl-v1.2/blob/main/conformance_tests.yaml#L1178) globs a directory and expected output contains the listing property

so the listing property should not be present at all.

Only for CWL expressions.

loadListing: Specify the desired behavior for loading the listing field of a Directory object for use by expressions.

(emphasis added)

The final output object must have all Directory listings recursively expanded, as the locations of each File may be very different paths. I've opened https://github.com/common-workflow-language/cwl-v1.2/issues/108 to clarify this implicit requirement for CWL v1.2.1

mr-c commented 3 years ago

I think there might be a similar conflict in 1.2 conformance test ID 131.

The tool asks for a Directory input with no load_listing key, and has no LoadListingRequirement itself, meaning that by default it should see no listing, right?

I think you've found a bug, yes.

cwltool should trim listings in this case & the conformance test should be adjusted for CWL v1.2.1; likewise a conformance test to confirm the trimming in this scenario should also be added.