Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡
https://engineering.nike.com/spark-expectations
Apache License 2.0
148 stars 32 forks source link

Bug fix on PR- 80 #88

Closed vigneshwarrvenkat closed 2 months ago

vigneshwarrvenkat commented 2 months ago

Description

Related Issue

Bugs

  1. $ is default delimiter for custom dq query but $ is being escaped by databricks runtime while insertion. Fix: Changing the default delimiter to @

  2. The regex pattern for capturing the Agg DQ results is failing when the expression has spaces in between. Eg: sum(sales) > 100 Fix: Changing the regex to handle spaces

  3. The summary row dq results catures both the failed and passed row dq rules. But the detailed stats enhancement was programmed with the assumtion that only failed results are captured. Fix: Handling the pass status scenario in the get_row_dq_detailed_stats method

Motivation and Context

How Has This Been Tested?

Tested with the sample data

Screenshots (if appropriate):

Types of changes

Checklist: