ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

lock down regex, add more representative transform namespacing #67

Closed mikeAdamss closed 5 months ago

mikeAdamss commented 5 months ago

What

Another consolidation pr to tidy things us.

RE regex - the r just means "raw string" so allows you to not care about escaping special characters (like '.'). ^ means start of string, $ means end of string. Without this it can annoyingly match unexpected system files generated during the decompression (i.e it was double matching and finding a file called _pipeline-config.json which we don't care about).

How to review

Set the 4 env vars from here: https://github.com/ONSdigital/dp-data-pipelines/blob/sandbox/dpypelines/pipeline/shared/notification.py to the webhook (see slack channel or ask in slack for a webhook). literally export DE_SLACK_WEBHOOK=<the webhook url etc etc in your terminal.

run this in a script in the root of the repo:

from dpypelines import s3_tar_received

s3_tar_received.start("dp-bleed-ingest-submission-bucket/valid.tar")

you'll probably need to be signed into aws via SSO to do this. If it works you should get an "it worled" type message via a notification to slack (if it doesn't you should get error messages via slack too)

Who can review

Anyone with aws SSO.