ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

email notifications for invalid data submission #110

Open mikeAdamss opened 3 months ago

mikeAdamss commented 3 months ago

What is this

We need to add the ability to inform data submitters via email when their submitter data is invalid.

We need to:

In terms of scope, we want an email where the fault lies with the submitted data. So for now, start with anywhere we're checking for a file existing.

Note: a "feature flag" is just an env var for a conditional that lets you toggle a feature on or off. It's used so you can merge implementations and turn them on at a later date. We're using it here as we know we need email capibility but don't yet have the capibility to get a submitters emails.

What to do

Dev prep - Set up an ses recipient email you can access

In order to try this out you'll need to add the recipient email to ses and validate it (the usual confirmation url to an email address) - note, for the sender address I just created a free gmail email, you can do the same if you dont want to user a personal address, the setup steps once you have a recipient email are:

for now (again never do this usually) you can set the env var TEMPORARY_SUBMITTER_EMAIL in glue.

this will be picked up by this temporary function allowing you to send emails to that address via ses.

Make the changes

I've started with an example here: https://github.com/ONSdigital/dp-data-pipelines/blob/email/dpypelines/pipeline/dataset_ingress_v1.py.

Build on this branch and when you push code the pipeline will be updated to use it.

This branch is deployed in bleed. You can trigger the pipeline by putting tar files in the submission bucket.

Now that's all set up, the task is:

You don't need war and peace at this point, just a simple email suitable to tell a data submitter what the problem was.

For the feature flag I added a place to the code where you could add one. Just pick an env var with a sensible name and use that to turn emails on or off: https://github.com/ONSdigital/dp-data-pipelines/blob/a2aecc58fc4417546105e910820c530b83582833/dpypelines/pipeline/shared/utility.py#L32

Acceptance Criteria