axman6 / amazonka-s3-streaming

Provides a conduit based interface to uploading data to S3 using the Multipart API
MIT License
20 stars 23 forks source link

Modularize this library and export small building blocks (?) #29

Open bitc opened 2 years ago

bitc commented 2 years ago

This comment is based on the latest code found here: https://github.com/axman6/amazonka-s3-streaming/blob/8af09b5dd73c5c0998efeb8c9d826855e794d908/src/Network/AWS/S3/StreamingUpload.hs

I haven't gotten too deep into this library yet, but from reading the source code, and reading some of the discussions here on GitHub, as well as thinking about some ideas of my own, I have come to the realization that there are 3 tricky issues that we need for S3 streaming uploads:

  1. Robust and flexible error handling
  2. Retry logic
  3. Custom logging for things like progress bars
  4. Pausing/Stopping the upload, and resuming in the future
  5. Support for uploading using only presigned URLs (might not be a common use-case, but one I am interested in)

Trying to shove all of this functionality directly into one big function would bloat it with lots of configuration flags and callback functions.

I think a better approach would be to directly export the building blocks (for example the processAndChunkOutputRaw, enumerateConduit, startUpload, multiUpload, and finishMultiUploadConduit conduits). This way, the users of the library can assembly and customize things to fit exactly their needs.

I think it might require a bit of re-thinking to mold the code into such a modular structure, but it could result in a more powerful and flexible library. I am also intrigued by the discussion in https://github.com/axman6/amazonka-s3-streaming/issues/26 regarding concurrency, which I think is also crucial good performance (buffering the next chunk while the current chunk is uploading).

Thank you