#10 upload client - Githubissues

What

UploadClient class created, along with upload(), _create_temp_chunks() and _delete_temp_chunks() methods (based on this script).

UploadClient needs an upload_url to instantiate it. To use it to access the DP Upload Service, you will need a Florence login (instructions to follow), in order to generate an access token. This should be exported as an environment variable called FLORENCE_TOKEN.

The upload() method accepts a csv file path (as a string or pathlib.Path), an S3 bucket URL and an optional chunk size (default 5242880 bytes). The csv file is then split into chunks, and each chunk is uploaded to the specified upload_url. The method returns the full S3 URL and S3 object key.

How to review

Things to note:

The upload method is currently configured to match the old upload endpoint specification. Once the upload-new endpoint has been exposed, the params argument passed to self.post will need to be adjusted to the new specification.
The current upload endpoint expects a resumableCurrentChunkSize parameter - I tried to get this using len(f.read()), but this is causing the request to fail, so I've removed it for now. Alternative solutions welcome.
There are two other parameters that aren't currently being populated (resumableRelativePath and aliasName) as I'm not sure what values these should take.

Testing note: To prevent unnecessary HTTP requests, vcrpy is being used in test_upload() - the expected request and response are stored in tests/http/cassettes/test_upload_new.yaml. Additional configuration has been specified in conftest.py, to filter out sensitive/dynamic content.

Could probably do with a couple more tests, but I wanted to get some feedback before adding these.

Who can review

Anyone.

ONSdigital / dp-python-tools

#10 upload client #28

What

How to review

Who can review