DataWorkz-NL / KubeETL

ETL controller for Kubernetes
Apache License 2.0
4 stars 0 forks source link

DataSet health checks #17

Closed Blokje5 closed 3 years ago

Blokje5 commented 3 years ago

Data Quality is an often neglected aspect of ETL. To simplify setting up DQ checks in pipelines, we can include it as a default aspect of a DataSet.

A health check should execute on a cron schedule, and based on the last result (job fail or job success), the status of the DataSet should be updated to healthy or unhealthy. If no health check is defined, the status of the DataSet will be unknown.