Open Mitmischer opened 3 years ago
If I was approaching this with the validators included in yamale, I think something like this would work:
thing: any(include('type_a'), include('type_b'))
---
type_a:
mode: enum('one')
one_paramsetX: ...
one_paramsetY: ...
type_b:
mode: enum('two')
two_paramset: ...
If there were any shared parameters, I would write a "base" type with just those, and then merge them into type_a and type_b.
That said, I've been using required_if_present
and required_if_missing
as kwargs in a private yamale fork in order to change requiredness based on sibling nodes (not) existing, which sounds analogous to what you bring up last. Might be something to think about upstreaming.
Cerberus looks really interesting, hadn't heard of it before, thanks for bringing it up!
@mechie I've been using an approach like this for a while now and we've started drowning under it. we have a configuration that has a list of dictionaries, and a type
key in the dictionary determines what rules the other key-values have to abide by. Here is what are any(*include)
looks like:
transform: include('transform')
---
transform:
transformation: include('transformations')
transformations: >-
list(
any(
include('AddMissingColumns'),
include('AsType'),
include('ConvertMicroSecsToMilli'),
include('DefaultValues'),
include('Denest'),
include('DropDuplicates'),
include('DuplicateColumns'),
include('ETLImportDate'),
include('FormatDates'),
include('RowNumbers'),
include('FromJson'),
include('HashAttributes'),
include('LowerCaseColumnNames'),
include('LowerCaseKeys'),
include('NullifyValuesWithLetters'),
include('OrderColumns'),
include('RegexSub'),
include('RemoveRowsWhereNull'),
include('RemoveRowsWhereNotNull'),
include('RemoveRowsWithEmptyList'),
include('RemoveUnexpectedColumns'),
include('RenameAttributes'),
include('Replace'),
include('SliceStrings'),
include('Sort'),
include('StandardizeColumns'),
include('StripHTML'),
include('StripWhitespace'),
include('StrReplace'),
include('ToDatetime'),
include('ToInt'),
include('ToIntAmongStrings'),
include('ToJson'),
include('TruncateBytes'),
include('AirTableTransform'),
include('AddSourceS3Key'),
include('AddValuesFromSourceTags')
),
required=True
)
However, this will result in an INSANE failure message when there is only 1 violation. The dictionary getting validated will be compared against every single one of the include
s which creates a needle-in-the-haystack problem trying to figure out what the key is, here is a case where i added foo: bar
to one of the dictionaries, and the result is:
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('AddMissingColumns',)
transform.transformations.1.desired_columns: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('AsType',)
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('ConvertMicroSecsToMilli',)
transform.transformations.1.attribute: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('DefaultValues',)
transform.transformations.1.defaults: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('Denest',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('DropDuplicates',)
transform.transformations.1.subset: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('DuplicateColumns',)
transform.transformations.1.originals: Required field missing
transform.transformations.1.duplicates: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('ETLImportDate',)
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('FormatDates',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.formats: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('RowNumbers',)
transform.transformations.1.column_name: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('FromJson',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('HashAttributes',)
transform.transformations.1.hash_column_name: Required field missing
transform.transformations.1.hash_columns: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('LowerCaseColumnNames',)
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('LowerCaseKeys',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('NullifyValuesWithLetters',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('OrderColumns',)
transform.transformations.1.column_order: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('RegexSub',)
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('RemoveRowsWhereNull',)
transform.transformations.1.attribute: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('RemoveRowsWhereNotNull',)
transform.transformations.1.attribute: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('RemoveRowsWithEmptyList',)
transform.transformations.1.attribute: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('RemoveUnexpectedColumns',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('Replace',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.replacements: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('SliceStrings',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.slice_args: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('Sort',)
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('StandardizeColumns',)
transform.transformations.1.desired_columns: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('StripHTML',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('StripWhitespace',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('StrReplace',)
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('ToDatetime',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('ToInt',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('ToIntAmongStrings',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('ToJson',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('TruncateBytes',)
transform.transformations.1.attributes: '{}' is not a list.
transform.transformations.1.byte_max: Required field missing
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('AirTableTransform',)
transform.transformations.1.accepted_values: Required field missing
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('AddSourceS3Key',)
transform.transformations.1.attributes: Unexpected element
transform.transformations.1.foo: Unexpected element
transform.transformations.1.type: 'RenameAttributes' not in ('AddValuesFromSourceTags',)
transform.transformations.1.columns_from_tags: Required field missing
what yamale
needs is an operator that allows you to define the schema at a certain key-path based on the value of a key at that keypath.
There are a few other points where we have a similar structure, but the child keys are more diverse between the options, and it makes the problem of trying to figure out which key is the problem really hard.
We use these tests in a CI pipeline which has less technical contributors and an error message like this really scares them off even if i have learned out to figure out how to read this.
Sounds like we're facing a very similar problem, actually! Less-technical contributors to a CI pipeline with Yamale-validated YAML configs getting very scary errors stemming from any(include() * N)
. You're right that it doesn't scale well beyond, say, 3-4 items.
I've been thinking about how to improve the error messaging, but improving the schema definitions/parsing might be easier after all. It's been hard to make time for this since it's not a frequent issue. If I do get the time to come up with a solution, I'll update this issue.
I have added some error pruning logic before. My approach was to look at the 'depth' of the error, i.e. the length of the path. In the 'any' case keep only the errors with the longest path. In the map and list case keep only the errors with the shortest path. It worked fairly well in practice.
First of all, thank you for the great package!
I have the following use case and wonder if it's already implemented or if it's missing from Yamale.
In this case,
one_paramsetX
andone_paramsetY
are only required for objecta
andtwo_params
is only required for objectb
. The decision should be based on the value ofmode
. To me, there currently seems to be no way to make this work within yamale.A more simplified use case would be to make the "
required
ness" of some keys depend on the presence of other keys, not their values. In this case, that would work a bit thedependencies
orcontains
rules of cerberus: https://docs.python-cerberus.org/en/stable/validation-rules.html