The word "pipeline" was getting really heavily overloaded and was making it hard for people to follow the logic, especially in relation to the config we're passing in. So this should clear all that up.
There are two principle configurable inputs coming from the config json that dictate which code from this repo is applied to a given input from s3..
_Remember, we start at s3_tar_received.start() if it helps._
(a) the pipeline field (will, not just yet, its hard coded for now), dictate which of the scripts from ./pipelines provides the overall steps to be applied. (and called from s3 tar received)
(b) transform_identifier which specifies the transform code from here to be utilised by the "trranfsorm the data" part of (a) that we'll be plumbing in next.
What
The word "pipeline" was getting really heavily overloaded and was making it hard for people to follow the logic, especially in relation to the config we're passing in. So this should clear all that up.
There are two principle configurable inputs coming from the config json that dictate which code from this repo is applied to a given input from s3..
_Remember, we start at
s3_tar_received.start()
if it helps._pipeline
field (will, not just yet, its hard coded for now), dictate which of the scripts from./pipelines
provides the overall steps to be applied. (and called from s3 tar received)transform_identifier
which specifies the transform code from here to be utilised by the "trranfsorm the data" part of (a) that we'll be plumbing in next.They were previously a little muddled so have tided up the nomaclature, see here for the newer and more clear config example: https://github.com/ONSdigital/dp-data-pipelines/blob/minor-fixes/tests/fixtures/test-cases/test_pipeline_config_valid_id.json
How to review
Sanity check, run tests yourself.
I had to update some tests and as well and run
make fmt
andmake lint
so it looks like a lot mores changed than actually has.Who can review
anyone.