clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Add support for aggregating all inbound data before calling server Ring handler #692

Open KingMob opened 9 months ago

KingMob commented 9 months ago

About

The official Ring spec states that the body of a request should be an InputStream, and Aleph complies with this (when not in raw mode).

However, InputStreams aren't always desirable, because they're a blocking interface.

If no data is available (and you haven't reached EOS), any read calls will block. Block on enough of these in your Ring handlers, and you can run out of threads/resources. This limitation propagates to many common transformations of your InputStream, including slurp and bs/to-string.

One way around this is to let Aleph/Netty accumulate all the body ByteBufs before calling the user handler with the Ring request map. Netty won't consume extra threads for this, though of course, memory will be consumed.

Aggregation currently happens automatically if you set the max-request-body-size option, but that's an implementation detail. This issue aims to be more explicit.

Proposal

Add an option like aggregate-data?, wait-for-body-to-finish?, or whatever, which adds aggregation. For HTTP1, it would be the HttpObjectAggregator. For HTTP2, it'll use the custom aggregator used for the body size limit.

While we shouldn't always force a size limit, we must still apply some limits, or a trivial DoS attack will be to send an infinite stream to a server, and force it to consume all memory.

We can either apply a size limit, a time limit, or both. If the aggregation option is set, so must one of the limit options. We can stick with max-request-body-size, but we should probably add a new one for a time limit. Maybe max-request-time or max-execution-time. We already have a bunch of timeouts on the client side, like request-timeout and read-timeout, maybe we should call it read-timeout for consistency.

DerGuteMoritz commented 9 months ago

Maybe max-request-time or max-execution-time. We already have a bunch of timeouts on the client side, like request-timeout and read-timeout, maybe we should call it read-timeout for consistency.

:+1: for naming it read-timeout for the stated reason. Also, the suggested max-request-time and especially max-execution-time could be misinterpreted to also cover the handler's execution.

Another limit that could be relevant in this context is a minimum transmission rate to counter slowloris-style attacks. See e.g. what Apache's mod_reqtimeout offers.

KingMob commented 9 months ago

Good point. We may also want to add response timeouts, and I think max-execution-time would be good for that. Currently, Aleph doesn't have many timeouts on the server side. I suspect other peers in the chain (like load balancers, reverse proxies, etc) might have timeouts that will apply, but aleph should still have them.