Galileo-Galilei / kedro-pandera

A kedro plugin to use pandera in your kedro projects
https://kedro-pandera.readthedocs.io/en/latest/
Apache License 2.0
33 stars 5 forks source link

Enable lazy validation at a dataset level #21

Open Galileo-Galilei opened 1 year ago

Galileo-Galilei commented 1 year ago

Description

Instead of failing immediately when one check is wrong, pandera supports perfomring all check before failing

Context

Make debugging easier by getting all errors in a single run

Possible Implementation

Pass kwargs to schema.validate() through a config file or a dataset.metadata extra key, e.g.:

iris: 
    type: pandas. CSVDataSet
    filepath: /path/to/iris.csv
    metadata:
        pandera:
            schema: ${pa.yaml: _iris_schema}
            validation_kwargs: 
                lazy: True

This key can ultimately support all the arguments available in the validate method: https://pandera.readthedocs.io/en/stable/reference/generated/methods/pandera.api.pandas.container.DataFrameSchema.validate.html