UploadClient class created, along with upload(), _create_temp_chunks() and _delete_temp_chunks() methods (based on this script).
UploadClient needs an upload_url to instantiate it. To use it to access the DP Upload Service, you will need a Florence login (instructions to follow), in order to generate an access token. This should be exported as an environment variable called FLORENCE_TOKEN.
The upload() method accepts a csv file path (as a string or pathlib.Path), an S3 bucket URL and an optional chunk size (default 5242880 bytes). The csv file is then split into chunks, and each chunk is uploaded to the specified upload_url. The method returns the full S3 URL and S3 object key.
How to review
Things to note:
The upload method is currently configured to match the old upload endpoint specification. Once the upload-new endpoint has been exposed, the params argument passed to self.post will need to be adjusted to the new specification.
The current upload endpoint expects a resumableCurrentChunkSize parameter - I tried to get this using len(f.read()), but this is causing the request to fail, so I've removed it for now. Alternative solutions welcome.
There are two other parameters that aren't currently being populated (resumableRelativePath and aliasName) as I'm not sure what values these should take.
Testing note: To prevent unnecessary HTTP requests, vcrpy is being used in test_upload() - the expected request and response are stored in tests/http/cassettes/test_upload_new.yaml. Additional configuration has been specified in conftest.py, to filter out sensitive/dynamic content.
Could probably do with a couple more tests, but I wanted to get some feedback before adding these.
What
UploadClient
class created, along withupload()
,_create_temp_chunks()
and_delete_temp_chunks()
methods (based on this script).UploadClient
needs anupload_url
to instantiate it. To use it to access the DP Upload Service, you will need a Florence login (instructions to follow), in order to generate an access token. This should be exported as an environment variable called FLORENCE_TOKEN.The
upload()
method accepts a csv file path (as a string or pathlib.Path), an S3 bucket URL and an optional chunk size (default 5242880 bytes). The csv file is then split into chunks, and each chunk is uploaded to the specifiedupload_url
. The method returns the full S3 URL and S3 object key.How to review
Things to note:
upload
method is currently configured to match the oldupload
endpoint specification. Once theupload-new
endpoint has been exposed, theparams
argument passed toself.post
will need to be adjusted to the new specification.upload
endpoint expects aresumableCurrentChunkSize
parameter - I tried to get this usinglen(f.read())
, but this is causing the request to fail, so I've removed it for now. Alternative solutions welcome.resumableRelativePath
andaliasName
) as I'm not sure what values these should take.Testing note: To prevent unnecessary HTTP requests, vcrpy is being used in
test_upload()
- the expected request and response are stored intests/http/cassettes/test_upload_new.yaml
. Additional configuration has been specified inconftest.py
, to filter out sensitive/dynamic content.Could probably do with a couple more tests, but I wanted to get some feedback before adding these.
Who can review
Anyone.