ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

Data Transformation - Implement Validation / Testing / Notifications for SDMX 2.1 Transform #127

Open osamede20 opened 5 months ago

osamede20 commented 5 months ago

What is this?

SDMX 2.1 transform level validation i.e. if we receive 100 'rows / records' of data as an input, we output those 100 'rows / records'.

Ultimately we are implementing data quality metrics by monitoring for anomalies, outliers, missing values, or unexpected duplicates etc. to ensure the integrity and accuracy of the data.

By using the Notification component(s) we are alerting about the detected anomalies or issues for code or data remediation steps.

What to do:

Implement testing & validation sections in the sdmx 2.1 transform code to ensure the transform code was correctly written and the output tidy csv is correct.

Implement the notification components to signal where invalid transforms are captured / identified in the sdmx 2.1 transform code

Acceptance Criteria