fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
12 stars 6 forks source link

Tuple support in JSON Schemas depend on draft version #564

Closed tcompa closed 10 months ago

tcompa commented 10 months ago

Here is another subtlety of JSON Schemas, that comes up as part of work on the new import-ome-zarr task.

TLDR

I need to introduce an argument which is a length-two list (that is, the YX shape of a grid), and the best in that case is to use a tuple[int,int] type hint. But the handling of such type in JSON Schema changed when going from version 2019-09 to 2020-12 (https://json-schema.org/draft/2020-12/release-notes#changes-to-items-and-additionalitems), meaning that we would get JSON Schemas that are not valid in the more recent version.

For the moment I prefer to stick with producing very general JSON Schemas, supported by all recent versions (Draft 7, 2019 and 2020). This is also because a change in which version we support would be connected with several other Fractal aspects (e.g. the pydantic version we use in fractal-tasks-core, and the implementation of arguments forms in fractal-web). See also https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/375.

Workaround

I explored two possible workarounds:

  1. A trivial one, where I use two different parameters (one for Y, one for X).
  2. A slightly more complex one, where I define a GridSize pydantic model to be used as a type hint for a single parameter.

Both workarounds result into schemas which are broadly compatible (i.e. they are valid with JSON Schema drafts 7, 2019 and 2020).

The first one is a bit verbose, but the form rendering is very transparent: Screenshot from 2023-10-12 11-47-30

The second one is more elegant, but the fractal-web rendering seems heavy: Screenshot from 2023-10-12 11-45-06

I'm proceeding with option 1, unless there are strong opinion in favor of 2.

Full example

This example is here for the record, for when we'll get back to this kind of problems.

from pydantic.decorator import ValidatedFunction
from devtools import debug
from jsonschema.validators import Draft201909Validator
from jsonschema.validators import Draft202012Validator
from jsonschema.validators import Draft7Validator
from jsonschema.exceptions import SchemaError

def task_function(
        x: tuple[int, int] = (2, 2),
        ):
    pass

vf = ValidatedFunction(task_function, config=None)
schema = vf.model.schema()
debug(schema)

my_validator = Draft7Validator(schema=schema)
my_validator.check_schema(my_validator.schema)

my_validator = Draft201909Validator(schema=schema)
my_validator.check_schema(my_validator.schema)

my_validator = Draft202012Validator(schema=schema)
try:
    my_validator.check_schema(my_validator.schema)
except SchemaError as e:
    print("*** Validation failed with Draft202012Validator ***")
    print(f"Original error:\n{str(e)}")

Output (with Python 3.10.12, pydantic 1.10.13, jsonschema 4.19.1):

$ python test.py 
test.py:18 <module>
    schema: {
        'title': 'TaskFunction',
        'type': 'object',
        'properties': {
            'x': {
                'title': 'X',
                'default': (2, 2),
                'type': 'array',
                'minItems': 2,
                'maxItems': 2,
                'items': [
                    {
                        'type': 'integer',
                    },
                    {
                        'type': 'integer',
                    },
                ],
            },
            'v__duplicate_kwargs': {
                'title': 'V  Duplicate Kwargs',
                'type': 'array',
                'items': {
                    'type': 'string',
                },
            },
            'args': {
                'title': 'Args',
                'type': 'array',
                'items': {},
            },
            'kwargs': {
                'title': 'Kwargs',
                'type': 'object',
            },
        },
        'additionalProperties': False,
    } (dict) len=4
*** Validation failed with Draft202012Validator ***
Original error:
[{'type': 'integer'}, {'type': 'integer'}] is not of type 'object', 'boolean'

Failed validating 'type' in metaschema['allOf'][1]['properties']['properties']['additionalProperties']['$dynamicRef']['allOf'][1]['properties']['items']['$dynamicRef']['allOf'][0]:
    {'$defs': {'anchorString': {'pattern': '^[A-Za-z_][-A-Za-z0-9._]*$',
                                'type': 'string'},
               'uriReferenceString': {'format': 'uri-reference',
                                      'type': 'string'},
               'uriString': {'format': 'uri', 'type': 'string'}},
     '$dynamicAnchor': 'meta',
     '$id': 'https://json-schema.org/draft/2020-12/meta/core',
     '$schema': 'https://json-schema.org/draft/2020-12/schema',
     '$vocabulary': {'https://json-schema.org/draft/2020-12/vocab/core': True},
     'properties': {'$anchor': {'$ref': '#/$defs/anchorString'},
                    '$comment': {'type': 'string'},
                    '$defs': {'additionalProperties': {'$dynamicRef': '#meta'},
                              'type': 'object'},
                    '$dynamicAnchor': {'$ref': '#/$defs/anchorString'},
                    '$dynamicRef': {'$ref': '#/$defs/uriReferenceString'},
                    '$id': {'$comment': 'Non-empty fragments not allowed.',
                            '$ref': '#/$defs/uriReferenceString',
                            'pattern': '^[^#]*#?$'},
                    '$ref': {'$ref': '#/$defs/uriReferenceString'},
                    '$schema': {'$ref': '#/$defs/uriString'},
                    '$vocabulary': {'additionalProperties': {'type': 'boolean'},
                                    'propertyNames': {'$ref': '#/$defs/uriString'},
                                    'type': 'object'}},
     'title': 'Core vocabulary meta-schema',
     'type': ['object', 'boolean']}

On schema['properties']['x']['items']:
    [{'type': 'integer'}, {'type': 'integer'}]

Perspective

We will eventually need to switch to Pydantic v2, which officially supports a specific (and recent) JSON Schema version (i.e. 202012, see https://docs.pydantic.dev/latest/concepts/json_schema). Note that v1.10 does not explicitly mention the version of JSON Schemas that can be produced (https://docs.pydantic.dev/1.10/usage/schema/).

This change will come with some issues:

  1. The switch would be relatively doable, internally in fractal-tasks-core, but there may be important blocking points when coexisting with other libraries (either fractal-tasks-core dependencies, or libraries that may be installed together with fractal-tasks-core in other packages).
  2. The second issue (which I think is actually an improvement) is that we'll need to review the fractal-web implementation, and explicitly state which JSON Schema version(s) we require/support.

Some positive aspect

Our CI validates schemas against multiple JSON Schema versions. The docstring of that test used to say "FIXME: it is not clear whether this test is actually useful", but it turns out that the test was indeed useful.