23andMe / Yamale

A schema and validator for YAML.
MIT License
679 stars 88 forks source link

Enhancement: Make schema depend on available keys #159

Open Mitmischer opened 3 years ago

Mitmischer commented 3 years ago

First of all, thank you for the great package!

I have the following use case and wonder if it's already implemented or if it's missing from Yamale.

a:
  mode: 'one'
  one_paramsetX: '... some parameters for mode one ...'
  one_paramsetY: '... some more parameters for mode one ...'

b:
  mode: 'two'
  two_paramset: '... some parameters for mode two ...'

In this case, one_paramsetX and one_paramsetY are only required for object a and two_params is only required for object b. The decision should be based on the value of mode. To me, there currently seems to be no way to make this work within yamale.

A more simplified use case would be to make the "requiredness" of some keys depend on the presence of other keys, not their values. In this case, that would work a bit the dependencies or contains rules of cerberus: https://docs.python-cerberus.org/en/stable/validation-rules.html

mechie commented 3 years ago

If I was approaching this with the validators included in yamale, I think something like this would work:

thing: any(include('type_a'), include('type_b'))
---
type_a:
  mode: enum('one')
  one_paramsetX: ...
  one_paramsetY: ...

type_b:
  mode: enum('two')
  two_paramset: ...

If there were any shared parameters, I would write a "base" type with just those, and then merge them into type_a and type_b.

That said, I've been using required_if_present and required_if_missing as kwargs in a private yamale fork in order to change requiredness based on sibling nodes (not) existing, which sounds analogous to what you bring up last. Might be something to think about upstreaming.

Cerberus looks really interesting, hadn't heard of it before, thanks for bringing it up!

aaronlutz commented 2 years ago

@mechie I've been using an approach like this for a while now and we've started drowning under it. we have a configuration that has a list of dictionaries, and a type key in the dictionary determines what rules the other key-values have to abide by. Here is what are any(*include) looks like:

transform: include('transform')
---
transform:
  transformation: include('transformations')

transformations: >-
    list(
      any(
        include('AddMissingColumns'),
        include('AsType'),
        include('ConvertMicroSecsToMilli'),
        include('DefaultValues'),
        include('Denest'),
        include('DropDuplicates'),
        include('DuplicateColumns'),
        include('ETLImportDate'),
        include('FormatDates'),
        include('RowNumbers'),
        include('FromJson'),
        include('HashAttributes'),
        include('LowerCaseColumnNames'),
        include('LowerCaseKeys'),
        include('NullifyValuesWithLetters'),
        include('OrderColumns'),
        include('RegexSub'),
        include('RemoveRowsWhereNull'),
        include('RemoveRowsWhereNotNull'),
        include('RemoveRowsWithEmptyList'),
        include('RemoveUnexpectedColumns'),
        include('RenameAttributes'),
        include('Replace'),
        include('SliceStrings'),
        include('Sort'),
        include('StandardizeColumns'),
        include('StripHTML'),
        include('StripWhitespace'),
        include('StrReplace'),
        include('ToDatetime'),
        include('ToInt'),
        include('ToIntAmongStrings'),
        include('ToJson'),
        include('TruncateBytes'),
        include('AirTableTransform'),
        include('AddSourceS3Key'),
        include('AddValuesFromSourceTags')
      ),
      required=True
    )

However, this will result in an INSANE failure message when there is only 1 violation. The dictionary getting validated will be compared against every single one of the includes which creates a needle-in-the-haystack problem trying to figure out what the key is, here is a case where i added foo: bar to one of the dictionaries, and the result is:

    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('AddMissingColumns',)
    transform.transformations.1.desired_columns: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('AsType',)
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('ConvertMicroSecsToMilli',)
    transform.transformations.1.attribute: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('DefaultValues',)
    transform.transformations.1.defaults: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('Denest',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('DropDuplicates',)
    transform.transformations.1.subset: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('DuplicateColumns',)
    transform.transformations.1.originals: Required field missing
    transform.transformations.1.duplicates: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('ETLImportDate',)
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('FormatDates',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.formats: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('RowNumbers',)
    transform.transformations.1.column_name: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('FromJson',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('HashAttributes',)
    transform.transformations.1.hash_column_name: Required field missing
    transform.transformations.1.hash_columns: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('LowerCaseColumnNames',)
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('LowerCaseKeys',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('NullifyValuesWithLetters',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('OrderColumns',)
    transform.transformations.1.column_order: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('RegexSub',)
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('RemoveRowsWhereNull',)
    transform.transformations.1.attribute: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('RemoveRowsWhereNotNull',)
    transform.transformations.1.attribute: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('RemoveRowsWithEmptyList',)
    transform.transformations.1.attribute: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('RemoveUnexpectedColumns',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('Replace',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.replacements: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('SliceStrings',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.slice_args: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('Sort',)
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('StandardizeColumns',)
    transform.transformations.1.desired_columns: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('StripHTML',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('StripWhitespace',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('StrReplace',)
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('ToDatetime',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('ToInt',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('ToIntAmongStrings',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('ToJson',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('TruncateBytes',)
    transform.transformations.1.attributes: '{}' is not a list.
    transform.transformations.1.byte_max: Required field missing
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('AirTableTransform',)
    transform.transformations.1.accepted_values: Required field missing
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('AddSourceS3Key',)
    transform.transformations.1.attributes: Unexpected element
    transform.transformations.1.foo: Unexpected element
    transform.transformations.1.type: 'RenameAttributes' not in ('AddValuesFromSourceTags',)
    transform.transformations.1.columns_from_tags: Required field missing

what yamale needs is an operator that allows you to define the schema at a certain key-path based on the value of a key at that keypath.

There are a few other points where we have a similar structure, but the child keys are more diverse between the options, and it makes the problem of trying to figure out which key is the problem really hard.

We use these tests in a CI pipeline which has less technical contributors and an error message like this really scares them off even if i have learned out to figure out how to read this.

mechie commented 2 years ago

Sounds like we're facing a very similar problem, actually! Less-technical contributors to a CI pipeline with Yamale-validated YAML configs getting very scary errors stemming from any(include() * N). You're right that it doesn't scale well beyond, say, 3-4 items.

I've been thinking about how to improve the error messaging, but improving the schema definitions/parsing might be easier after all. It's been hard to make time for this since it's not a frequent issue. If I do get the time to come up with a solution, I'll update this issue.

drmull commented 2 years ago

I have added some error pruning logic before. My approach was to look at the 'depth' of the error, i.e. the length of the path. In the 'any' case keep only the errors with the longest path. In the map and list case keep only the errors with the shortest path. It worked fairly well in practice.