23andMe / Yamale

A schema and validator for YAML.
MIT License
684 stars 90 forks source link

Dynamic validation based on keys in data (internal reference) #253

Open idantene opened 3 weeks ago

idantene commented 3 weeks ago

Hey,

I have a thought for dynamic schema validation that I think is currently lacking (perhaps not too common request, though I believe it can be implemented as another validator).

In this case, I have a data YAML file where a mapping is expected. The keys may be anything the user decides, and the values are according to some predefined schema - so far, so good.

Next, at a later part in the data YAML, some other mapping occurs. Here, the value of the mapping has to relate to the keys defined earlier.

For example (a data YAML):

metrics:
  iou:  # this can be whatever the user chooses
    name: foo
    unit: bar
    direction: up
...
results:
  - name: something
    metric: iou  # this has to be one of the keys defined under `metrics` above
    value: 59
nbaju1 commented 2 weeks ago

Similar feature has been requested earlier: #154

idantene commented 2 weeks ago

I hadn't noticed that one, my apologies.

I'm not sure why it's not within the scope of this project. Internal references are common, and since the file is read anyway, the schema can be dynamic in that sense...

nbaju1 commented 2 weeks ago

The validators themselves only have access to the object being validated, so I imagine this would require quite a large refactoring of the project to support this.

idantene commented 2 weeks ago

Is that the case? It's been a while since I contributed to the project so I don't recall the details, but it seems that in schema.py#L80, the full contents of data are passed around.

That would suggest, for example, that this type of validators would be deferred until data is provided.

nbaju1 commented 2 weeks ago

Its the Validator class that is used to validate an object, which only receives the object it self, not the full yaml file. So this can't be solved by simply introducing a new validator.

idantene commented 2 weeks ago

Of course this would require a bit more complicated implementation (i.e. not simply subclassing the Validator class). That shouldn't be a problem though.

If I understand correctly, the full flow is as follows:

  1. Create a dictionary of validators in Schema (method _process_schema returns a dictionary which is then assigned to self._schema).
  2. Calls to Schema.validate pass the full YAML data content.
  3. The validate method passes the dictionary self._schema, which then winds down to _validate_static_map_list method.
  4. In _validate_static_map_list, the keys of the data and the keys of the validator map are compared. If they mismatch, an error is raised. If they do match, we start iterating on a per-key basis by passing along the sub_validator, the key, and again, the full YAML data, to _validate_item.
  5. Finally, in _validate_item, we try to pull the relevant data-item from the full (or parent, since it recurses down eventually) YAML content, and then call _validate with the validator and data-item.

So, my suggestion would then be to allow a deferment of validators at a higher level here. For example, a Validator could help a boolean is_deferred, in which case, we do not attempt to pull the specific data-item from the YAML content, but rather pass along either the parent/full YAML data, depending on the reference type, for example.

abourree commented 2 weeks ago

Hi,

Five years ago, I propose two new Validators in #82 that may answer your need. My PR was refused because leaders doesn't want dynamics schema. In some way that's a good practice to have static shema.

Arnaud.