Open tom-tan opened 3 years ago
Thank you @tom-tan for spotting this. I would positively view a PR to https://github.com/common-workflow-language/cwl-v1.2/blob/main/Process.yml#L426 to do as you suggest.
OK, I will send a request later.
Can I clarify the spec about the listing
to do that?
The spec says it is optional and its type is array<File | Directory>
. Also, the type
field of listing
in the schema is (of course it is the same as the website of the spec):
- name: listing
type:
- "null"
- type: array
items: [File, Directory]
I expect the listing
field with null
value in the case of no_listing
but the conformance test #243 requires it should be undefined
.
Does it have different interpretation from null
or can we fix the tests by replacing such undefined
with null
?
@tom-tan That's a good point:
What does each of the following mean for a class: Directory
?
listing: null
listing: []
listing
at allThe test at https://github.com/common-workflow-language/cwl-v1.2/blob/f3ec4d1d04f5ccb7983f7d1fdd2e08ec96c69bc6/tests/listing_none2.cwl#L14 tests for the 3rd case (as that is how to detect a missing property in Javacsript) ; that is to say, it tests for an unspecified directory listing.
To me, case 1 and case 2 both mean that this Directory
is empty. You are asking if case 1 (listing
is present, but the value is an explicit null
) should have the same meaning as case 3 (no listing
at all): that the Directory
is not necessarily empty, it is just unspecified.
Can I clarify the term "unspecified"? Which is what you are intended in the comment?
listing
property returns a platform-dependent value,listing
property does not provided, orlisting
property has undefined
value.By the way,
To me, case 1 and case 2 both mean that this
Directory
is empty.
It makes me confused. For example, an input object param1: null
is distinct from param1: []
in CWL spec (even if the resulted command line options are same). I prefer that listing: null
and listing: []
have distinct interpretations because it makes easier to implement platforms especially with statically typed languages.
Earlier I said
The test at https://github.com/common-workflow-language/cwl-v1.2/blob/f3ec4d1d04f5ccb7983f7d1fdd2e08ec96c69bc6/tests/listing_none2.cwl#L14 tests for the 3rd case (as that is how to detect a missing property in Javacsript) ; that is to say, it tests for an unspecified directory listing.
@tom-tan asks
Can I clarify the term "unspecified"? Which is what you are intended in the comment?
The CWL conformance test referenced above checks that the Directory object lacks (does not have) a property named "listing"
as loadListing: no_listing
is written under LoadListingRequirement
.
So we should clarify https://www.commonwl.org/v1.2/Workflow.html#LoadListingEnum to say
symbol | description |
---|---|
no_listing | Do not include a "listing" key in "Directory" objects. |
shallow_listing | Include a "listing" key in "Directory" objects, but only include one level of contents; do not recurse into subdirectories. |
deep_listing | Include a "listing" key in "Directory" objects and recursively load all subdirectories as well. |
By the way,
To me, case 1 and case 2 both mean that this
Directory
is empty.It makes me confused. For example, an input object
param1: null
is distinct fromparam1: []
in CWL spec (even if the resulted command line options are same). I prefer thatlisting: null
andlisting: []
have distinct interpretations because it makes easier to implement platforms especially with statically typed languages.
We need explicit meanings for all three. I think we have good meanings for Case 2 (listing: []
, this Directory
is empty) and Case 3 (no listing
attribute due to loadListing: no_listing
, the contents of this Directory
object are not available to CWL parameter references nor to CWL expressions).
What meaning would you ascribe to Case 1 (listing: null
, the listing
property is present on the Directory
object and its value is a literal JSON/YAML null
)?
I can't think of another useful meaning for Case 1. We can't forbid it in CWL v1.2, as null
is explicitly allowed: https://github.com/common-workflow-language/cwl-v1.2/blob/e65a2de29828799d8ab29886bf7964655500e8f4/Process.yml#L422
On the other hand, the spec of
Directory
says that:If
listing
is not provided, the implementation must have some way of fetching the Directory listing at runtime based on thelocation
field.I am not sure I can say it a "conflict" but at least CWL v1.0, that has the same description of
listing
, implicitly requires the platforms loading the Directory listing recursively.It would be nice if the section of
Directory
mentionsloadListing
andLoadListingRequirement
.
Agreed, there should be a mention in https://github.com/common-workflow-language/cwl-v1.2/blob/e65a2de29828799d8ab29886bf7964655500e8f4/Process.yml#L425 about loadListing
, the meaning of a null
value versus leaving out the listing
field entirely.
For example, if when loadListing: no_listing
and a CWL expression adds a listing
to a received class: Directory
object and returns that modified object, then a new class: Directory
needs to be made (perhaps at a new location
) with the new listing
contents, instead of the previous contents which were hidden.
This would make a good conformance test, I think.
What meaning would you ascribe to Case 1 (
listing: null
, thelisting
property is present on theDirectory
object and its value is a literal JSON/YAML null)?
I interpreted the following sentences in the spec
Specify the desired behavior for loading the listing field of a Directory object for use by expressions.
and
Do not load the directory listing.
as: listing
field is loaded but its contents are not loaded (Important note: for statically typed programmer, it is hard to guess omitting the field itself unless it is clarified).
Also I misunderstood that the expression $(inputs.d.listing === undefined)
represents "inputs.d.listing
has undefined
value" because I am not familiar with the detail of JavaScript.
IMO it is hard for non-JavaScript programmers to guess this behavior because it seems to compare the value rather than to check the existence of the field.
I can't think of another useful meaning for Case 1. We can't forbid it in CWL v1.2, as null is explicitly allowed:
Unfortunately it is hard to fix it, I guess.
The SALAD (and CWL spec that depends on SALAD) uses the null
type in the following distinct meanings:
null
value, andlisting
that enables users to use Directory
object without listing
field in the input object)As a result of that, schema-salad-tool passes the following CWL document:
#!/usr/bin/env cwl-runner
class: CommandLineTool
cwlVersion: v1.2
requirements:
LoadListingRequirement:
loadListing: no_listing
InlineJavascriptRequirement: {}
inputs:
d: Directory
outputs:
out:
type: boolean
outputBinding:
outputEval: $(inputs.d.listing === undefined)
baseCommand: "true"
hints: null # !!!!????
$ schema-salad-tool CommonWorkflowLanguage.yml tests/listing_none2.cwl
/usr/local/bin/schema-salad-tool Current version: 7.0.20201119201711
Document `tests/listing_none2.cwl` is valid
I'm not sure it is a limitation of SALAD itself or it comes from JSON-LD or Avro.
IMO treating listing: null
as "listing
field is missing" is reasonable in the current situation because it needs less implementation changes (SALAD already do that as explained above) and is consistent with other parts of the spec of CWL such as omitting rules for input and output parameters.
@tetron Do you think that listing: null
is equivalent to "listing
field is not present"?
I agree. We've been trying to avoid the distinction between "null" and "undefined" in the data model, so things that are "undefined" are considered "null". That's also a much better policy for code generation where the fields already exist but can have a "null" value.
I agree. We've been trying to avoid the distinction between "null" and "undefined" in the data model, so things that are "undefined" are considered "null". That's also a much better policy for code generation where the fields already exist but can have a "null" value.
Is that a CWL v1.2.1 clarification or a CWL v1.3/2.0 change?
I would call it a clarification, I'm trying to find the reference (assuming we wrote it down) that fields not specified in input objects get turned into null
.
@tetron @tom-tan Upon review of all the optional fields and references to null
it appears to be safe to state that fieldname: null
is the same as that fieldname
not being specified (and vice-versa). I made a separate issue to track that: https://github.com/common-workflow-language/cwl-v1.2/issues/87
On this note, there seems to be a conflict in conformance test 86 if I am not mistaken. The default behaviour for loadListing is no_listing
(https://www.commonwl.org/v1.2/CommandLineTool.html#CommandOutputBinding), so the listing property should not be present at all. The test 86 (https://github.com/common-workflow-language/cwl-v1.2/blob/main/conformance_tests.yaml#L1178) globs a directory
and expected output contains the listing property
output:
"outdir": {
"class": "Directory",
"listing": [
{
"class": "File",
"location": "goodbye.txt",
"checksum": "sha1$dd0a4c4c49ba43004d6611771972b6cf969c1c01",
"size": 24
},
{
"class": "File",
"location": "hello.txt",
"checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b",
"size": 13
}
],
}
so the test fails if listing is missing, but it should be missing because the output is defined as
outputs:
outdir:
type: Directory
outputBinding:
glob: .
We can simply add load_listing: shallow_listing
to the test to handle this, but on the topic of this thread it confuses me how this test passes anywhere if loadListing is implemented this way, am I missing something?
The default behaviour for loadListing is no_listing
Looks like this default behavior changed between v1.0 and v1.2 of CWL specs, which I found experimentally recently while transitioning our 300 CWL files from v1.0 to v1.2.
I see in https://www.commonwl.org/v1.1/Workflow.html#Changelog
Added LoadListingRequirement and loadListing to control whether and how Directory listings should be loaded for use in expressions.
but it is not clear that the default changed.
Thanks for adding conditional execution in 1.2!
while transitioning our 300 CWL files from v1.0 to v1.2
Interesting! Any particular reason you did a mass upgrade? Did you use https://github.com/common-workflow-language/cwl-upgrader/ ?
FYI, the cwl-upgrader
does set loadListing: deep_listing
when upgrading to CWL v1.2 if there was no cwltool#LoadListingRequirement
extension already used in the pre v1.2 document.
Thank you for your reply, Michael.
Any particular reason you did a mass upgrade?
Conditional execution. We did not use the upgrader. Didn't know it existed (our fault).
extension already used in the pre v1.2 document
Right. It was mentioned first in v1.1, AFAIK.
Thank you for letting us know about the existence of the upgrader, beats perl -i -pe 's{^cwlVersion:\s+1.0}{cwlVersion: 1.2}g'
:-) .
We are looking forward to using it when CWL 1.3 will come (and 2.0, and 3.0 and further on).
Azat
PS. When I was thinking about conditional execution, I was thinking about:
run: Expression
runChoices: [flow.cwl, flow2.cwl]
where "to do or not to do" are equivalent workflows, but that would lead potentially to combinatorial explosion over runChoices
in multiple condition points in code during validation.
Anyway, the ship has sailed.
Conditional execution.
Cool! For reference, you can mix v1.0
and v1.1
, and v1.2
CommandLineTool
s, ExpressionTool
s, and sub-Workflows
in a v1.2
Workflow
. We even have tests for version mixing as part of the CWL v1.2 conformance tests.
Didn't know [that cwl-upgrader] existed
No worries. I updated the description on https://www.commonwl.org to make it more obvious. I've also opened an issue to mention cwl-upgrader
and explain version mixing as part of the upcoming CWL v1.2.1 update
Right. It was mentioned first in v1.1, AFAIK.
Yep, I mispoke; I even linked to the v1.0
to v1.1
code without noticing it :-)
We are looking forward to using it when CWL 1.3 will come (and 2.0, and 3.0 and further on).
Huzzah :-)
our 300 CWL files
Are these public? It would be great to have the CommandLineTools
donated to https://github.com/common-workflow-library/bio-cwl-tools
For reference, you can mix
Good to know for a stopgap measure, but my practice is to avoid stopgaps as much as possible.
It would be great to have the CommandLineTools donated
Sure. The caveat is that they are heavily ASN.1-centric tools. ASN.1 is our go-to format of data internally for things related to biomolecules and other data. Unfortunately, this format is not very popular outside of NCBI.
People can still try to rework the workflows within our docker containers to build on PGAP by mounting their own CWL directories. For example, in my current addition of a new workflow to internal (in internal workflow format) and external (CWL) PGAP, 80% of tools were already there. As usual, remaining 20% is a bottleneck, especially to people unfamiliar with NCBI toolkit.
but my practice is to avoid stopgaps as much as possible.
:+1:
The caveat is that they are heavily ASN.1-centric tools.
Sure, we are still happy to have them!
I think there might be a similar conflict in 1.2 conformance test ID 131.
The tool asks for a Directory input with no load_listing key, and has no LoadListingRequirement itself, meaning that by default it should see no listing, right?
But then in its requirements section it accesses the listing anyway:
The test happens to run it with a Directory literal that specifies a listing, and I think cwltool
might just pass that listing along:
Are these public?
It looks like I did not post the obvious answer to this. Sorry about that.
Yes, they are posted here at Github: https://github.com/ncbi/pgap
On this note, there seems to be a conflict in conformance test 86 if I am not mistaken. The default behaviour for loadListing is
no_listing
(https://www.commonwl.org/v1.2/CommandLineTool.html#CommandOutputBinding), so the listing property should not be present at all. The test 86 (https://github.com/common-workflow-language/cwl-v1.2/blob/main/conformance_tests.yaml#L1178) globs a directory and expected output contains the listing propertyso the listing property should not be present at all.
loadListing
: Specify the desired behavior for loading thelisting
field of a Directory object for use by expressions.
(emphasis added)
The final output object must have all Directory
listing
s recursively expanded, as the location
s of each File
may be very different paths. I've opened https://github.com/common-workflow-language/cwl-v1.2/issues/108 to clarify this implicit requirement for CWL v1.2.1
I think there might be a similar conflict in 1.2 conformance test ID 131.
The tool asks for a Directory input with no load_listing key, and has no LoadListingRequirement itself, meaning that by default it should see no listing, right?
I think you've found a bug, yes.
cwltool
should trim listing
s in this case & the conformance test should be adjusted for CWL v1.2.1; likewise a conformance test to confirm the trimming in this scenario should also be added.
The spec of
LoadListingEnum
used inLoadListingRequirement
says:On the other hand, the spec of
Directory
says that:I am not sure I can say it a "conflict" but at least CWL v1.0, that has the same description of
listing
, implicitly requires the platforms loading the Directory listing recursively.It would be nice if the section of
Directory
mentionsloadListing
andLoadListingRequirement
.