ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

Data Transformation - Implement Validation / Testing / Notifications (not to be confused with file output validation). #45

Open martynspooner opened 8 months ago

martynspooner commented 8 months ago

What is this?

Transform level validation. i.e. if we receive 100 'rows / records' of data as an input, we output those 100 'rows / records'.

Ultimately we are implementing data quality metrics by monitoring for anomalies, outliers, missing values, or unexpected duplicates etc. to ensure the integrity and accuracy of the data.

By using the Notification component(s) we are alerting about the detected anomalies or issues for code or data remediation steps.

What to do:

You need to:

Implement testing & validation sections in the transform code to ensure we have performed the transform correctly. Implement the notification components to signal where invalid transforms are captured / identified.

Acceptance Criteria

martynspooner commented 8 months ago

Timeboxed at 10 days.

mikeAdamss commented 7 months ago

have ask Jim not to bother with notifications for this. The functions in the pipeline are all wrapped in a try catch that'll handle that (as long as the function raises an error on hitting an issue) notifying of a failure with a link back to logs.