23andMe / Yamale

A schema and validator for YAML.
MIT License
666 stars 88 forks source link

Validation of fields in sub-levels #211

Closed felipe88alves closed 1 year ago

felipe88alves commented 1 year ago

I'm trying to validate a known image field, but it can be found under a diverse amount of different fields (which are not relevant for validation).

Example:

  1. Sub-level - Global
    values:
    global:
    image:
      repository: str()
      tag: str()
  2. Sub-level - Controller App
    values:
    controller:
    image:
      repository: str()
      tag: str()
  3. Sub-level - App-B
    values:
    app-b:
    image:
      repository: str()
      tag: str()

Is there a way to define an "any" field name to replace global, controller and app-b fields in the examples above? Or perhaps skip the validation at this level but still allow the validation of the image fields underneath them?

mildebrandt commented 1 year ago

You need to use include and map validators. Here's a schema that would validate all of your samples (the validator for keys of a map is str() by default):

values: map(include('image_map'))
---
image_map:
  image:
    repository: str()
    tag: str()

However, I assume you'll want other keys than just image in your structure. Here's a more flexible way to do that:

values: map(include('value_map'))
---
value_map:
  image: include('image_map')
  type: str(required=False)
  size: int(required=False)
---
image_map:
  repository: str()
  tag: str()
felipe88alves commented 1 year ago

Hi @mildebrandt, Thanks for the help, I hadn't really understood the map validator until your example. This all seems very rigid, is this a design choice for yamale?

I actually have this image field on a variety of "levels", and the files have a number of other fields that I wish to ignore (I'm using the strict=False flag to handle that for now) I'm wondering if I can define the logic you laid out for image on different levels in a straight-forward manner?

Examples:

values:
  image:
    repository: str()
    tag: str()
values:
  global:
    image:
      repository: str()
      tag: str()
  random_field:
    sub: to_be_ignored
values:
  other:
    global:
      image:
        repository: str()
        tag: str()
nbaju1 commented 1 year ago

I don't think Yamale is created to handle dynamic schemas, which I assume is what you are really asking for (i.e. validate the object "image" wherever it may be defined). I believe you will have to create different schemas for all the various possibilities in your setup.

felipe88alves commented 1 year ago

Thanks for the feedback nbaju1. Would this be an interesting feature to introduce? Or does it go against the tool's intent/design?

mildebrandt commented 1 year ago

I can't comment on whether this would be a feature that would be welcome since I'm no longer a maintainer.

However, here's a schema that will validate all the examples so far:

values: include('value_map')
---
value_map:
  image: include('image_map', required=False)
  global: include('global_map', required=False)
  other: include('other_map', required=False)

other_map:
  image: include('image_map', required=False)
  global: include('global_map', required=False)

global_map:
  image: include('image_map', required=False)

image_map:
  repository: str()
  tag: str()

You would still need to specify the --no-strict flag to bypass any unexpected tags.

felipe88alves commented 1 year ago

Learning loads here, thanks @mildebrandt :)

So I tried combining both suggestions you've made to fulfill my requirements. The goal is to capture the occurrence of the image map, regardless of its level/indentation/location. The proposal is to use map() recursively to capture image in a number of levels/indentation/location, and subset to say that the image map may be found one or more times of these locations.

Sample:

values:
  image:
    repository: str()
    tag: str()
  global:
    image:
      repository: str()
      tag: str()
    extra_map_global:
      extra_subfield_global: <ignore>
    extra_field1_global: <ignore>
  extra_map:
    extra_subfield: <ignore>
  extra_field1: <ignore>

I was trying something like this, but I must be missing something. Schema:

values: subset(include('image_map', required=False), map(include('image_map', required=False), required=False))
---
image_map:
  image:
    repository: str()
    tag: str()

second_level_image_map: map(include('image_map', required=False), required=False)

I get the following errors:

YAML Validation failed!
Error validating data '/Users/afelipe/Documents/work/repos/dev-platform/apps/aaatest/releases/test/values/release-profile-development.yaml' with schema '/Users/afelipe/Documents/work/repos/dev-platform/vdpctl/etc/schema-validation/releases/schemas/release-profile-development.schema'
    values.image: Required field missing
    values.global.image.repository: Required field missing
    values.image: Required field missing
    values.extra_map.image: Required field missing
    values.image: Required field missing
    values.extra_field1 : '<ignore>' is not a map

It seems like there may be two issues:

  1. The subset validator ignores the strict=false statement and fails because it doesn't find the image subfield within extra_map.
  2. The map validator fails because the extra_field fields are not maps, regardless of the `required=False" flag.

Is that the case? Do you have any suggestions on how to validate the Sample above?

Cheers

mildebrandt commented 1 year ago

Unfortunately for you, what @nbaju1 mentioned earlier is correct....you cannot validate the image tag wherever it appears in the yaml using the built-in Yamale validators. You need to be able to specify the path from the top element to where the image tag exists within the schema.

To answer your questions:

  1. Setting strict=false bypasses extra tags that aren't defined by the schema. In the above, values is defined as a collection of items that are one of the specified types. Ignoring items that don't match one of the types would cause this validator to never fail.
  2. Correct, when using subset, the value must be one of the specified types...and the schema says it must be a map of maps. Here the required doesn't do much because it's on a custom type that's used as an include so it behaves a little differently.

The other way to go is to create a custom validator. However, I have a feeling you'd have to perform all the validation of the full yaml yourself, and Yamale wouldn't really be doing anything at that point. Something like this:

import yamale
from yamale.validators import DefaultValidators, Map

class RecursiveImageMap(Map):
    tag = 'image_map'

    def validate_image_tag(self, value):
        # Recurse through entire dictionary to validate all image tags

    def _is_valid(self, value):
        is_map = super()._is_valid(value)
        if not is_map:
            return False

        return validate_image_tag(value)

Is this yaml structure something that you control? If so, perhaps it would be good to revisit the structure to remove some ambiguity.

felipe88alves commented 1 year ago

That's perfectly fine. I'll look into creating custom Validator's then. This is intended for a SaaS platform, so we don't always have full control of the yaml structure. Can you recommend any further documentation on creating custom validators? There's very little in the README.

I'm curious as to why you recommended the use of the RecursiveImageMap(Map) as opposed to RecursiveImageMap(Validator) for example. Especially given that you've defined the _is_valid method. The only other example I found where Validator wasn't used as the argument of a custom validator was here, where the Regex validator was used. In that case, only an __init__ method was defined.

I'll try to devise a generic solution, if I manage, I'll propose a dynamic validator tag that fulfills this purpose.

mildebrandt commented 1 year ago

To be extra clear, I don't recommend using a custom validator in your case....or even Yamale What I have in mind would basically not use Yamale other than as a yaml deserializer. In my head, you'd have this schema:

values: image_map()

And the method I defined above would look something like this:

def validate_image_tag(self, value):
    if not isinstance(value, dict):
        return True

    if 'image' in value:
        return 'repository' in value['image'] and 'tag' in value['image']

    for v in value.values():
        valid = validate_image_tag(v)
        if not v:
            return False

    return True

I didn't test that code, and it would need enhancements. However, hopefully you can see that all the usefulness of the Yamale library is gone...and it's just taking the full dictionary representation of your yaml and recursively looking for the image tag.

I'd avoid Yamale in this use case, it just doesn't fit.

felipe88alves commented 1 year ago

Hi, sorry for the delay. I may have finally understood what you meant.

If I understood correctly, the custom validator logic and the logic to iterate through the schema are completely independent from one another. So a validator will only be applied to a single line at a time. It cannot be provided with a multiline schema.

If that's it, then I get it. I'll close the issue. Thank you so much for all the support @mildebrandt, it's been super :)