Galileo-Galilei / kedro-pandera

A kedro plugin to use pandera in your kedro projects
https://kedro-pandera.readthedocs.io/en/latest/
Apache License 2.0
33 stars 4 forks source link

Allow converting the Dataframe according to the defined schema #63

Open mjspier opened 3 months ago

mjspier commented 3 months ago

Description

Pandera can not only be used to validate the Dataframe but also to convert the dtypes in the Dataframe accoding to the schema.

The schema.validate function returns the validated Dataframe with the converted dtypes. When can update the input dataframe with the validated dataframe so in the nodes we will get a validated and converted dataframe accorting to the schema.

Context

Possible Implementation

Add additional configuration parameter which allows per dataset to define if only want to validate or also to convert the dataset. If it is also configuted to convert the dataset we can forward the converted the dataset in the hook.

A global parameter can be defined which allows to specify the default behaviout for all datasets which use a pandera schema.