ONSdigital / dp-python-tools

Simple reusable python resources for digital publishing.
MIT License
1 stars 0 forks source link

#10 upload client #28

Closed SarahJohnsonONS closed 6 months ago

SarahJohnsonONS commented 6 months ago

What

UploadClient class created, along with upload(), _create_temp_chunks() and _delete_temp_chunks() methods (based on this script).

UploadClient needs an upload_url to instantiate it. To use it to access the DP Upload Service, you will need a Florence login (instructions to follow), in order to generate an access token. This should be exported as an environment variable called FLORENCE_TOKEN.

The upload() method accepts a csv file path (as a string or pathlib.Path), an S3 bucket URL and an optional chunk size (default 5242880 bytes). The csv file is then split into chunks, and each chunk is uploaded to the specified upload_url. The method returns the full S3 URL and S3 object key.

How to review

Things to note:

Testing note: To prevent unnecessary HTTP requests, vcrpy is being used in test_upload() - the expected request and response are stored in tests/http/cassettes/test_upload_new.yaml. Additional configuration has been specified in conftest.py, to filter out sensitive/dynamic content.

Could probably do with a couple more tests, but I wanted to get some feedback before adding these.

Who can review

Anyone.