Open jypma opened 8 years ago
I think will be good to be configurable ideally, with at least 3 options:
Interesting -- I hadn't thought about scaling to this extent. What sort of use case are we talking about? I'm picturing someone forking off a bunch of streams, leaving them open, and pushing data into them.
We are building a (huge) document storage system, potentially saving many concurrent documents at the same time. Some of them small, some of them up to several 100 MB. I expect the operations to be mostly I/O bound, and hence senseful to leave many upload streams to S3 open simultaneously. At least up to the extent that we're saturating our upload bandwidth from EC2.
The chunker at the moment requires (at least) 5MB of memory for every ongoing upload stream. With 100 concurrent connections, that'll easily eat a Java heap with nothing left over.
Buffering to temp files instead should not give a considerable overhead if it stays within disk cache, but allow the general system to scale much further, if one can live with (max S3 upload rate) = (max disk read speed).