Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡
https://engineering.nike.com/spark-expectations
Apache License 2.0
148 stars 32 forks source link

[FEATURE] Performance improvement #68

Closed pinnapuvijay closed 1 month ago

pinnapuvijay commented 4 months ago

Is your feature request related to a problem? Please describe. There is an opportunity to improve the performance so the pipelines run times don't get impacted much. The "write error records" and "action" steps are significantly taking longer time than other steps during the runtime.

Describe the solution you'd like The "write error records" and "action" steps to be tuned so they run faster.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.