facebookresearch / Private-ID

A collection of algorithms that can do join between two parties while preserving the privacy of keys on which the join happens
Apache License 2.0
201 stars 46 forks source link

Build multipart upload #85

Closed yuyashiraki closed 2 years ago

yuyashiraki commented 2 years ago

Summary:

Context

We found that AWS-SDK S3 API would fail when we try to write more than 5GB of data. It is a blocking us to do capacity testing for a larger FARGATE container.

As mentioned in the post, one of our options is to use multi-part upload developed by AWS to split a file into smaller chunks to form the single S3 file. We have discussed with AWS engineers and decided to develop multipart upload logic.

Technical Details

We are splitting a file into 300MB (314572800 bytes) ByteStreams. Then, using multipart upload to upload each part to construct the single S3 file.

The multipart upload consists of three steps.

  1. creating the multipart upload instance via create_multipart_upload(). it would generate upload id.
  2. sending bytestreams to S3 using upload_part()
  3. completing multipart upload via complete_multipart_upload() after uploading all the parts

Ref

Below are the references we used to develop multipart functionality.

Differential Revision: D39534523

facebook-github-bot commented 2 years ago

This pull request was exported from Phabricator. Differential Revision: D39534523