bluelabsio / s3-stream

Akka Streaming Client for S3 and Supporting Libraries
Other
16 stars 6 forks source link

Chunker should be able to use temp files instead of memory #12

Open jypma opened 8 years ago

jypma commented 8 years ago

The chunker at the moment requires (at least) 5MB of memory for every ongoing upload stream. With 100 concurrent connections, that'll easily eat a Java heap with nothing left over.

Buffering to temp files instead should not give a considerable overhead if it stays within disk cache, but allow the general system to scale much further, if one can live with (max S3 upload rate) = (max disk read speed).

filosganga commented 8 years ago

I think will be good to be configurable ideally, with at least 3 options:

joearasin commented 8 years ago

Interesting -- I hadn't thought about scaling to this extent. What sort of use case are we talking about? I'm picturing someone forking off a bunch of streams, leaving them open, and pushing data into them.

jypma commented 8 years ago

We are building a (huge) document storage system, potentially saving many concurrent documents at the same time. Some of them small, some of them up to several 100 MB. I expect the operations to be mostly I/O bound, and hence senseful to leave many upload streams to S3 open simultaneously. At least up to the extent that we're saturating our upload bandwidth from EC2.