We need to add the block of logic to the pipeline that uses the upload client upload the csv and any supplementary distributions to the upload service.
This will not work yet as platform and auth needs to be sorted out. The task is to get the logic in place such that it should work when these things are in place and to bolt down the behaviour with acceptance tests.
assume an env var of UPLOAD_SERVICE_S3_BUCKET - the bucket we're uploading files to
assume an env var of UPLOAD_SERVICE_URL - the url of the upload service
create a function in utility of get_florence_access_token() that return "not-implemented" as the token.
create a function of create_upload_client() or similar (as sooner or later we'll have an alternative pipeline that needs the same client). This would create an upload client using UPLOAD_SERVICE_URL
that should be ebough to put the logic in place, you're aiming for something (super roughly) like this:
# note - all faillible steps in their own try catch please
upload_client: UploadClient = get_upload_client()
upload_bucket = os.environ[UPLOAD_SERVICE_S3_BUCKET]
florence_token = get_florence_token()
# for the csv
upload_client.upload_csv(
<path to csv>,
upload_bucket,
florence_token
)
# pseduo code loop
for supplementary_distribution in supplementary_distributions:
# look at the file extension
# call the upload client with the appropriate method
......but...but....but, how do I confirm its working?????
We need to update the acceptance tests in dataset_ingress_v1.feature to confirm the upload client is making the expected http posts.
so roughly (you'll likely need to finagle logic a litte) it becomes something like:
Given a temporary source directory of files
| file | fixture |
| data.xml | esa2010_test_data.xml |
And a dataset id of 'valid'
And v1_data_ingress starts using the temporary source directory
Then the pipeline should generate no errors
And I read the csv output 'data.csv'
And the csv output should have '9744' rows
And the csv output has the columns
| ID | Test | Name xml:lang |
And I read the metadata output 'metadata.json'
And the metadata should match 'fixtures/correct_metadata.json'
And the backend receives a request to "/upload-new"
And the json payload received should match "fixtures/whatever-this-needs-to-be.json"
And the headers received should match
| key | value |
| something | expected |
| something else | expected |
The overarching point is we're not looking to test that an upload service exists and is working (not our problem),we're looking to test the pipeline is making the required outgoing requests.
Note - dont add a feature flag for this, its the last thing that happens, I'm entirely fine with the pipelines error'ing on the last step until we get them into a real env.
What is this
We need to add the block of logic to the pipeline that uses the upload client upload the csv and any supplementary distributions to the upload service.
This will not work yet as platform and auth needs to be sorted out. The task is to get the logic in place such that it should work when these things are in place and to bolt down the behaviour with acceptance tests.
What to do
There is a client for this in dp-python-tools: https://github.com/ONSdigital/dp-python-tools/tree/develop/dpytools/http.
You'll need to make some assumptions to do this:
UPLOAD_SERVICE_S3_BUCKET
- the bucket we're uploading files toUPLOAD_SERVICE_URL
- the url of the upload serviceget_florence_access_token()
that return "not-implemented" as the token.create_upload_client()
or similar (as sooner or later we'll have an alternative pipeline that needs the same client). This would create an upload client usingUPLOAD_SERVICE_URL
that should be ebough to put the logic in place, you're aiming for something (super roughly) like this:
......but...but....but, how do I confirm its working?????
We wrote some acceptance test steps that allow you to capture the outgoing http requests from the pipeline. See these steps here: https://github.com/ONSdigital/dp-data-pipelines/blob/sandbox/features/temporary.feature
We need to update the acceptance tests in
dataset_ingress_v1.feature
to confirm the upload client is making the expected http posts.so roughly (you'll likely need to finagle logic a litte) it becomes something like:
The overarching point is we're not looking to test that an upload service exists and is working (not our problem),we're looking to test the pipeline is making the required outgoing requests.
Note - dont add a feature flag for this, its the last thing that happens, I'm entirely fine with the pipelines error'ing on the last step until we get them into a real env.
Acceptance Criteria