Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡
https://engineering.nike.com/spark-expectations
Apache License 2.0
148 stars 32 forks source link

[FEATURE] Update documentation #95

Closed IMC07 closed 1 month ago

IMC07 commented 1 month ago

Is your feature request related to a problem? Please describe. Add explanation of the meaning of the configuration settings: enable_for_source_dq_validation and enable_for_target_dq_validation

The explanation: SE has three phases: 1) Take the source dataframe and run query_dq and agg_dq on it, when you enable_for_source_dq_validation is true 2) If the first step succeeds, run row_dq 3) Take the dataframe from row_dq and run query_dq and agg_dq on it, when you enable_for_target_dq_validation is true

Would also be nice to have the explanation of all the other configuration settings in place.