Avoid stream contention/starvation between substreams on HTTP/2 connection

jrudolph commented 2 years ago

As observed in https://github.com/lightbend/kalix-jvm-sdk/issues/1078, HTTP/2 substream Sinks materialized by the HTTP/2 infrastructure are run with the subFusingMaterializer so that they run with the same stream infrastructure as the main HTTP/2 connection infrastructure:

https://github.com/akka/akka-http/blob/2d1b8727d74d2332de294da3d0cfeba40e12bdcb/akka-http-core/src/main/scala/akka/http/impl/engine/http2/Http2StreamHandling.scala#L644

This can lead to 1) starvation if one of the substreams does CPU-intensive work (or even sleeps) inside of the stream 2) limits parallelization between concurrent substreams

A well-behaved streaming application does not run CPU-intensive (or blocking) payloads directly on the stream, so in many cases this will not become a problem.

There might be scenarios, however, where most traffic to a server might arrive on a single connection (e.g. behind a load balancer) with many expected concurrent streams. In that case, running all the substreams together in the same stream together with the connection infrastructure might be too much.

Note, how that applies mostly to streaming requests/responses. Requests that can be handed out with Strict entities (when collecting the full entity data by enabling min-collect-strict-entity-size was successful) and strict responses are not affected by this issue.

jrudolph commented 2 years ago

I'm not completely sure that something should be done here.

It seems consistent to avoid running user code inside of the connection stream and steer away from using subFusingMaterializer in these cases in the same way as we also run the handler passed to bind in a Future automatically to avoid stream contention. The extra cost compared to the overall cost of stream materialization in these cases might just be ok.

On the other hand, changing this behavior will add extra work that cannot be avoided by the user any more (unless we introduce another flag).

johanandren commented 2 years ago

If we leave as is I think it would be good to at least mention something around this in the docs as from the user API it is not obvious that what looks like separate invocations are not actually parallel.

jrudolph commented 2 years ago

H2ServerProcessingBenchmark seems to show a big difference (though the magnitude seems weird to me...):

jmh:run  -wi 5 -w 3 -i 4 -r 5 -f 1 -p requestbody=empty -p responsetype=closedelimited H2Server

With subFusingMaterializer:

[info] H2ServerProcessingBenchmark.benchRequestProcessing                      1          empty  closedelimited  thrpt    4  68145.489 ± 954.890  ops/s

With materializer:

[info] Benchmark                                           (minStrictEntitySize)  (requestbody)  (responsetype)   Mode  Cnt      Score      Error  Units
[info] H2ServerProcessingBenchmark.benchRequestProcessing                      1          empty  closedelimited  thrpt    4  50164.835 ± 6151.670  ops/s

jrudolph commented 2 years ago

H2ServerProcessingBenchmark seems to show a big difference (though the magnitude seems weird to me...):

Looking at the flamegraphs it looks legit. Materialization is expensive but for small stream graphs, creating and tearing down the actors is the most expensive part of materialization (we knew this before).

The benchmark does not include the network stack, so, of course, the impact is scaled a lot when taking that into account as well.

In summary, not super nice to change it without a way to configure it, since we spent so much time optimizing many code paths...

He-Pin commented 2 years ago

In helidon nima,every substream is run with a dedicated Virtual Thread.

akka / akka-http

Avoid stream contention/starvation between substreams on HTTP/2 connection #4150