Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡
https://engineering.nike.com/spark-expectations
Apache License 2.0
148 stars 32 forks source link

improving failed recording writing process and enabling details stats table #69

Closed jskrajareddy21 closed 4 months ago

jskrajareddy21 commented 4 months ago

Description

This PR solves two issues: https://github.com/Nike-Inc/spark-expectations/issues/68 and https://github.com/Nike-Inc/spark-expectations/issues/50

Related Issue

Motivation and Context

see issues

How Has This Been Tested?

performance has been tested on 44 million records. performance has improved write_error_records_final function and action_on_rules function. Additionally, introduced new feature, that has details stats in relational format

Screenshots (if appropriate):

Types of changes

Checklist:

asingamaneni commented 4 months ago

Had a call with the team and converted the PR to draft, until the issues are fixed