apache / datafusion-ballista

Apache DataFusion Ballista Distributed Query Engine
https://datafusion.apache.org/ballista
Apache License 2.0
1.56k stars 197 forks source link

Data quality framework #802

Open explicite opened 1 year ago

explicite commented 1 year ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

(This section helps Arrow developers understand the context and why for this feature, in addition to the what)

Describe the solution you'd like When building DAG of transformations, I want to be able define tests which can prove data correctness. On the end of the DAG I should be able to o review data quality and provide context to end user if required.

Like in Deequ I can check if all id's are unique or in some column I can find data in correct format. Other approaches Apache Glue, dbt test or Great Expectation

Describe alternatives you've considered Instead of building framework it's maybe possible to extend Great Expectation

Additional context Add any other context or screenshots about the feature request here.

YuriyGavrilov commented 1 year ago