ONSdigital / dp-data-pipelines

Pipeline specific python scripts and tooling for automated website data ingress.
MIT License
1 stars 0 forks source link

implement data_ingress_v1 - task 1 #35

Closed mikeAdamss closed 7 months ago

mikeAdamss commented 8 months ago

DEPENDS ON

What is this

We're getting to the point that we can being assembling the function https://github.com/ONSdigital/dp-data-pipelines/blob/sandbox/pipelines/pipeline/dataset_ingress_v1.py using the "small lego bricks" we've made this far.

For context - this function is called when source data files are present on the machine running python. S3, tar files etc are not a consideration here, this has been resolved by the calling function.

What to do

The function will start with one argument which is a path to a local directory (as a string - an absolute path, not a Path object).

From there:

In all cases use a try catch block and notify data engineering in the event of an issue before you raise.

It's just a function, you can just run it a such while developing.

Avoid adding code beyond try catches and calling things we've written. This function is intended as a wrapper of tested functionality and should not provide extra functionality in of itself.

You do not need to write tests as this will be covered by acceptance tests. Do sanity check this quite thoroughly as these tests will not be complete when this is picked up.

Acceptance Criteria

mikeAdamss commented 8 months ago

5

mikeAdamss commented 7 months ago

done