dbt-labs / hologram

A library for automatically generating Draft 7 JSON Schemas from Python dataclasses
MIT License
9 stars 13 forks source link

Do not use jsonschema.best_match because it obscures the errors #35

Closed gshank closed 3 years ago

gshank commented 4 years ago

The 'best_match' code in jsonschema was switching to the "first" error encountered and so the exception didn't have all of the information that we want. This is for issue #2700 in dbt.

beckjake commented 4 years ago

I think this is an improvement, we should merge it and then see if we can do better later.

Here's an example where this is definitely worse behavior in some ways:

from typing import Union, Any
import dataclasses, hologram
@dataclasses.dataclass
class Bar(hologram.JsonSchemaMixin):
    asdf: Union[str, Dict[str, Union[bool, str]]] = ''

Bar().from_dict({'asdf': {'key': 1}})

Results in this:

ValidationError: {'key': 1} is not valid under any of the given schemas

Failed validating 'oneOf' in schema['properties']['asdf']:
    {'default': '',
     'oneOf': [{'type': 'string'},
               {'additionalProperties': {'oneOf': [{'type': 'boolean'},
                                                   {'type': 'string'}]},
                'type': 'object'}]}

On instance['asdf']:
    {'key': 1}

Which, while definitely correct, is very much not as correct as the old message:

ValidationError: 1 is not of type 'string'

Failed validating 'type' in schema[0]:
    {'type': 'string'}

On instance:
    1

I think this is the right change - I would rather have too much information than too little, as we currently do. But I also think we must be able to do better by picking some fancy relevance function for best_error, or deciding on a better definition of best" than either of these two. You can imagine how painful this could get, given a complicated-enough type.