axman6 / amazonka-s3-streaming

Provides a conduit based interface to uploading data to S3 using the Multipart API
MIT License
20 stars 23 forks source link

backpressure? #1

Closed ababkin closed 7 years ago

ababkin commented 7 years ago

Alex, love the new lib, def looks very useful

I'm not too familiar with the upload part semantics. Let's say we operate in an environment of restricted RAM (like AWS Lambda) - are there any mechanisms in place (anywhere in the underlying amazonka code) that would prevent from trying to upload too many parts such that the available RAM is exhausted? Is it smart enough to provide this "backpressure" or is there a way to perhaps specify the limit on the number of concurrent part uploads?

axman6 commented 7 years ago

At the moment the code accumulates up to 6MB of data before sending that as a single part. If the Bytestrings being sent in happen to be much larger than 6MB that should be ok - an extra buffer will be allocated for the ByteString Builder's Lazy ByteString, so this could cause issues, but it generally shouldn't if the ByteStrings being streamed in are small. If you run into issues let me know, but really this code should be able to run in fairly constant space (say 12MB or so).

axman6 commented 7 years ago

Also, this doesn't do any concurrent uploading. I might consider doing that, particularly for the case where we know it's a file that's being uploaded (we can use a Producer for each part that starts at the right offsets in the file)

ababkin commented 7 years ago

ah, I assumed it's doing concurrent uploads. If it uploads the chunks strictly sequentially then there is no such problem as I've described above.

axman6 commented 7 years ago

yep. I'll have a think about how to do concurrent uploads too though. The Sink interface is probably less useful here. I might provide something like:

data UploadData
    = FilePath FilePath
    | ByteString Strict.ByteString
    | IO (Int -> IO (Maybe ByteString)) (Either (IO ()) (IO ())) 
        -- part number as input, may be called many times until Nothing is returned,
        -- and a function to close either this part or all parts

uploadConcurrently :: UploadData -> CreateMultipartUpload -> m CompleteMultipartUploadResponse