dbt-labs / hologram

A library for automatically generating Draft 7 JSON Schemas from Python dataclasses
MIT License
9 stars 13 forks source link

Hologram should support custom error matchers #29

Open beckjake opened 4 years ago

beckjake commented 4 years ago

During validation, the jsonschema library raises a very rich, complicated ValidationError object that represents a tree of error causes. Hologram then wraps this in validate, in part calling jsonschema.exceptions.best_match(validator.iter_errors(data)). For 99% of cases, this works great, but it falls apart in Unions where the "best match" is frequently not helpful. As the author of a complicated JsonSchemaMixin, I sometimes have my own heuristic I'd like to use (for instance: if the error is about this key, it's least likely to be the issue).

The jsonschema.exceptions.best_match actually allows you to supply a key function that will be used for prioritizing the errors. I would like to be able to override it during my class's validate, ideally without explicitly reaching into jsonschema itself. Here's an example of something I have written currently:

def _relevance_without_strategy(error: jsonschema.ValidationError):
    # calculate the 'relevance' of an error the normal jsonschema way, except
    # if the validator is in the 'strategy' field and its conflicting with the
    # 'enum'. This suppresses `"'timestamp' is not one of ['check']` and such
    if 'strategy' in error.path and error.validator in {'enum', 'not'}:
        length = 1
    else:
        length = -len(error.path)
    validator = error.validator
    return length, validator not in {'anyOf', 'oneOf'}

@dataclass
class ParsedSnapshotNode(ParsedNode):
    resource_type: NodeType = field(metadata={'restrict': [NodeType.Snapshot]})
    # this is a union of 3 types that are differentiated by the "strategy" key: "check", "timestamp", or "anything else"
    config: Union[
        CheckSnapshotConfig,
        TimestampSnapshotConfig,
        GenericSnapshotConfig,
    ]

    @classmethod
    def validate(cls, data: Any):
        schema = hologram._validate_schema(cls)
        validator = jsonschema.Draft7Validator(schema)
        error = jsonschema.exceptions.best_match(
            validator.iter_errors(data),
            key=_relevance_without_strategy,
        )
        if error is not None:
            raise hologram.ValidationError.create_from(error) from error

That's gross! The only thing I actually wanted to override was the key function passed to best_match. A nice interface might be this on JsonSchemaMixin:

    @classmethod
    def _best_match_key(cls) -> Callable[[jsonschema.ValidationError], Any]:
        # this is the default
        return jsonschema.exceptions.relevance

Then in JsonSchemaMixin.validate, it would also pass key=cls._best_match_key() along to jsonschema.exceptions.best_match.