common-workflow-language / cwl-v1.2

Released CWL v1.2.1 specification
https://www.commonwl.org/v1.2/
Apache License 2.0
34 stars 22 forks source link

Invalid definition of default value in sample YAML file #252

Closed fmigneault closed 1 year ago

fmigneault commented 1 year ago

The following definition, although it really resembles a valid float, is actually parsed as string in YAML.

https://github.com/common-workflow-language/cwl-v1.2/blob/d50fad00eedb071ef7372a0f471e6fc1155addcc/tests/floats_small_and_large.cwl#L19-L21

x = yaml.safe_load("""
test:
  default: 1.23e5
""")

type(x["test"]["default"])
<class 'str'>

x["test"]["default"]
'1.23e5'
x = yaml.safe_load("""
test:
  default: 1.23e+5
""")

type(x["test"]["default"])
<class 'float'>

x["test"]["default"]
123000.0

The scientific notation requires a +/- symbol. Because the CWL test passes the values as strings on the command line, I'm guessing this silently succeeds. However, the JSON schema validation I'm working on (https://github.com/opengeospatial/ogcapi-processes/pull/329) fails for this tool with the invalid match of type: float not equal to default's type.

fmigneault commented 1 year ago

Same in https://github.com/common-workflow-language/cwl-v1.2/blob/d50fad00eedb071ef7372a0f471e6fc1155addcc/tests/floats_small_and_large_nojs.cwl#L16-L19

mr-c commented 1 year ago

I don't see the requirement for a +/- symbol in the YAML specs: https://yaml.org/spec/1.2.1/#tag/repository/float

Using the provided regex from the YAML spec, I get a match for your example: https://regex101.com/r/j5Fsu5/1

Maybe switch to using a different YAML library, like ruamel.yaml? https://pypi.org/project/ruamel.yaml/

See also https://github.com/yaml/pyyaml/issues/173

fmigneault commented 1 year ago

This is weird. Even their own JSON sample doesn't work. And this is using the builtin json library.

j = json.loads('[ 0., -0.0, 12e03, -2E+05 ]')
Traceback (most recent call last):
  File "/home/francis/dev/conda/envs/terradue/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-ae4e14ea774e>", line 1, in <module>
    j = json.loads('[ 0., -0.0, 12e03, -2E+05 ]')
  File "/home/francis/dev/conda/envs/terradue/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/francis/dev/conda/envs/terradue/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/francis/dev/conda/envs/terradue/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 4 (char 3)

We need to add quotes around the 0. and 12e03 not recognized as valid floats.

j = json.loads('[ "0.", -0.0, "12e03", -2E+05 ]')
j
['0.', -0.0, '12e03', -200000.0]

Adding a + works though:

j = json.loads('[ "0.", -0.0, 12e+03, -2E+05 ]')
j
['0.', -0.0, 12000.0, -200000.0]
fmigneault commented 1 year ago

Same behaviour occurs using https://pypi.org/project/simplejson, the most common alternative I have seen to builtin json.

Really annoying... but yes, ruamel.yaml seems the only one that parses it as expected without adjusting the values themselves.

import yaml 
import ruamel.yaml as ryaml

yaml.safe_load('[ 0., -0.0, 12e03, -2E+05 ]')
[0.0, -0.0, '12e03', '-2E+05']

yaml.safe_load('[ 0., -0.0, 12.e03, -2E+05 ]')
[0.0, -0.0, '12.e03', '-2E+05']

yaml.safe_load('[ 0., -0.0, 12.e+03, -2E+05 ]')
[0.0, -0.0, 12000.0, '-2E+05']

ryaml.safe_load('[ 0., -0.0, 12e03, -2E+05 ]')
[0.0, -0.0, 12000.0, -200000.0]
mr-c commented 1 year ago

Thanks for the investigation @fmigneault . I'm going to close this issue.