For context - this function is called when source data files are present on the machine running python. S3, tar files etc are not a consideration here, this has been resolved by the calling function.
What to do
The function will start with one argument which is a path to a local directory (as a string - an absolute path, not a Path object).
From there:
creates a LocalDirectoryStore from it
verify that the specified required files have been provided
verify that the specified supplementary distributions have been provided
use config "pipeline" key to get the transform and sanity checking code for the source in question, you dont need to use in in this task, just make sure you can get it ready for follow up task.
In all cases use a try catch block and notify data engineering in the event of an issue before you raise.
It's just a function, you can just run it a such while developing.
Avoid adding code beyond try catches and calling things we've written. This function is intended as a wrapper of tested functionality and should not provide extra functionality in of itself.
You do not need to write tests as this will be covered by acceptance tests. Do sanity check this quite thoroughly as these tests will not be complete when this is picked up.
Acceptance Criteria
[ ] You can run this function locally, if you point it at a directory will a pipeline-config.json the logic should run through as expected.
DEPENDS ON
What is this
We're getting to the point that we can being assembling the function https://github.com/ONSdigital/dp-data-pipelines/blob/sandbox/pipelines/pipeline/dataset_ingress_v1.py using the "small lego bricks" we've made this far.
For context - this function is called when source data files are present on the machine running python. S3, tar files etc are not a consideration here, this has been resolved by the calling function.
What to do
The function will start with one argument which is a path to a local directory (as a string - an absolute path, not a
Path
object).From there:
LocalDirectoryStore
from itIn all cases use a try catch block and notify data engineering in the event of an issue before you raise.
It's just a function, you can just run it a such while developing.
Avoid adding code beyond try catches and calling things we've written. This function is intended as a wrapper of tested functionality and should not provide extra functionality in of itself.
You do not need to write tests as this will be covered by acceptance tests. Do sanity check this quite thoroughly as these tests will not be complete when this is picked up.
Acceptance Criteria