Nugine / s3s

S3 Service Adapter
Apache License 2.0
130 stars 31 forks source link

Unbounded Memory Allocation in `http::body` #155

Open Eosis opened 1 month ago

Eosis commented 1 month ago

I know you're aware that http::body, here, allows unbounded allocation.

Has any effort been spent on attempting to remove this allocation and allow streaming through the s3s interface? I may be able to spend some effort to help resolve this issue, but I'm not quite sure where to start at the moment.

Eosis commented 1 month ago

Further investigation has made me realise that my use case is streaming through (PutObject request).

I actually found that my memory bloat issue was caused by the signature check being forced to bring the body of the request into memory.

Unless I am mistaken, for a Single Chunk upload, it is not possible to verify the signature without first reading the entire body, meaning reading it into memory or disk. This is quite sad if you want to keep a low footprint 😢

I did, however, find that s3s is unnecessarily bringing the entire body into memory when the request's x-amz-content-sha256 header is UNSIGNED-PAYLOAD. I hacked a fix in a branch in my fork, here. This change (and a switch to using streaming in our client library) was enough to prevent the memory bloat for me. I'll try to make that prettier and push a PR when I have time.

YaZasnyal commented 1 month ago

I have been thinking about this for some time now. One possible solution could be to implement opportunistic checks, assuming that the signature is correct, and returning an error only when the last chunk is read. This approach would significantly reduce latency for PUT operations, especially in proxy mode.

Nugine commented 1 month ago

I have been thinking about this for some time now. One possible solution could be to implement opportunistic checks, assuming that the signature is correct, and returning an error only when the last chunk is read. This approach would significantly reduce latency for PUT operations, especially in proxy mode.

This solution is reasonable but I'm worried about the side effects. Attackers may be able to cost more resources than memory if they can execute PUT operations until the last chunk. We still need to limit this case.

We can investigate how other storage servers solve this problem.