#111 implement upload client in pipeline

What

dpypelines/pipeline/dataset_ingress_v1.py:

Env vars for UPLOAD_SERVICE_URL, UPLOAD_SERVICE_S3_BUCKET setting upload_url and s3_bucket variables
Florence access token set from get_florence_access_token() function in dpypelines/pipeline/shared/utils.py.
UploadClient created from upload_url using create_upload_client() in dpypelines/pipeline/shared/utils.py.
CSV file uploaded to Upload Service.
Supplementary distributions uploaded to Upload Service (only if file extension is ".xml"), using get_supplementary_distribution_file() function from dpypelines/pipeline/shared/utils.py.
Raise NotImplementedError if supplementary distribution file extension is not .xml.

features/dataset_ingress_v1.feature:

Steps added so that the backend Flask app captures the outgoing requests from the UploadClient.

features/steps/dataset_ingress.py:

Added a new valid_no_supp_dist entry to the CONFIGURATION dictionary, as the test will fail if more than one HTTP request is made, so we aren't currently testing that the supplementary distributions are also being uploaded (I checked on sandbox, and it's definitely working, though).

features/docker/fake_backend/app.py:

Explicit GET and POST methods added to @app.route("/<path:path>" (as uploading is a POST request).
JSON content of request captured as this-requests-json logging statement (added silent=True to stop it failing if there's no JSON content to get).

features/environment.py:

Env vars for UPLOAD_SERVICE_URL, UPLOAD_SERVICE_S3_BUCKET and FLORENCE_TOKEN are set to acceptable values for acceptance tests in before_all(), and reverted to original values in after_all().

features/steps/requests.py:

_parse_request_body_from_log() function added to get content that can't be parsed as a dictionary (as is the case for the CSV file).
_parse_dict_from_log() - refactored _parse_request_headers_as_dict_from_log() to be reusable for getting both headers and JSON content from logs.

How to review

Set env vars for UPLOAD_SERVICE_URL, UPLOAD_SERVICE_S3_BUCKET and FLORENCE_TOKEN, then run dataset_ingress_v1() with an appropriate input directory of files and pipeline config.

Check appropriate try...except structure used, and logging statements are capturing everything needed.

Who can review

Anyone.

ONSdigital / dp-data-pipelines