Verticles deployed with multiple instances do not scale processing evenly over the event loop threads

pantinor commented 4 years ago

Observation

Observed during profiling cpu processing that vertx event loop threads are not evenly utilized in the scenario where multiple verticle instances are deployed.

Expectation

Expected that with multi-reactor pattern, the event loop thread processing could be delegated evenly over the number of processing cores of the server.

Version

3.9.1

Context

During performance testing with vertx http servers listening and created over multiple verticle instances, we see in profiling that only 1 of the vert event loop threads are using for processing incoming http requests handled on the Router.

Steps to reproduce

Start a http server using code similar to below test listeing on port 8080.
Start http traffic into the server with a client tool. I used 5 requests per second with client written using vertx and http2 protocol. I can show the code for this client if needed.
With a profiler (used yourkit), observe the threads cpu utilization and filter out the vertx event loop threads.
Observe only 1 of the threads is used while the others are used in processing.

Extra

public class WorkerTest {

    public void testServer() throws Exception {

        final Vertx vertx = Vertx.vertx(new VertxOptions());

        vertx.exceptionHandler(new Handler<Throwable>() {
            @Override
            public void handle(Throwable ex) {
                ex.printStackTrace();
            }
        });

        int threadCount = Runtime.getRuntime().availableProcessors() * 2;

        DeploymentOptions dop = new DeploymentOptions()
        .setInstances(threadCount);

        vertx.deployVerticle(TestVerticle.class.getName(), dop, (event) -> {
            if (!event.failed()) {

            } else {
                event.cause().printStackTrace();
            }
        });

        Thread.sleep(120_000_000);
    }

    public static class TestVerticle extends AbstractVerticle {

        @Override
        public void start() throws Exception {

            System.out.println("verticle starting " + this.deploymentID() + " context " + this.context.hashCode() + " inst count " + this.context.getInstanceCount());

            final Router router = Router.router(this.vertx);

            router.putWithRegex("/nnrf-nfm/.*").handler(h -> {
                HttpServerResponse response = h.response();
                response.setStatusCode(200);
                response.end();
            });

            HttpServer httpServer = this.vertx
                    .createHttpServer(new HttpServerOptions().setHost("localhost").setPort(8080))
                    .requestHandler(router)
                    .listen();
        }
    }

}

jponge commented 4 years ago

This is very likely because the client is keeping and reusing the connection to the server.

pantinor commented 4 years ago

Hi Julien, I was able to confirm that when using separate connections per request, the processing is evenly distributed over the threads. We have a use case where the http2 client will be reusing the connection. Can we do anything with the vertx API to have this scaled over the threads with the single connection and multiple http2 streams scenario?

jponge commented 4 years ago

Perhaps you could decouple receiving HTTP requests and actual processing, say 1 front verticle for the HTTP work and N verticles to process requests, and glue these verticles over the event-bus.

vietj commented 4 years ago

@pantinor this is indeed due to HTTP/2 behavior to use a single physical connection and multiplex many streams.

@jponge suggestion is correct

In Vert.x 4 we might be able to implement such load balancing behavior at the HTTP level in later versions.

vietj commented 4 years ago

@pantinor note you get the same behavior with HTTP/1 persistent connections but to a lesser degree

vietj commented 4 years ago

@pantinor you can also mitigate this effect with HTTP/2 and force the client to use less streams by changing the concurrency in HttpServerOptions using Http2Settings with maxConcurrentStream value set to the value you like, e.g with 5 you will get at most 5 streams and the client might either do less requests or open more connections to satisfy the demand.

vietj commented 4 years ago

In Vert.x HTTP client you have a http2MaxPoolSize equals to 1 by default but you can increase it.

vietj commented 4 years ago

I labelled this issue as question rather

pantinor commented 4 years ago

Thanks for the feedback. We implemented the load balancing over the event loop threads by deploying multiple verticle instances. And will take note of the http2 max concurrent streams setting. That helps as well. Was not sure about the approach that Julien suggested regarding the decoupling of the stream handling from the connection's verticle handler. If there is an example that outlines how that can be set up it would be helpful. Thanks.

jponge commented 4 years ago

I don't see a ready example here but the idea is like:

HttpVerticle handles incoming HTTP traffic, and for anything that requires processing, forwards/sends to some event-bus destination like incoming.data.ingestion, and awaits a reply
ProcessingVerticle listens to incoming.data.ingestion, receives the event and does some work, then replies to the message
HttpVerticle gets some reply from the request on the event-bus, and pushes back some payload over HTTP (or none, depends what you are doing).

msonhub commented 4 years ago

@jponge : Any downside of deploying the HttpVerticle (that handles incoming HTTP traffic) as a worker ? In that case, the handler of each verticle's http-server is invoked by a worker-pool-thread; is it not? Please advise.

jponge commented 4 years ago

@msonhub I/O processing should be done on an event-loop, so deploying as a worker verticle is not what you want.

msonhub commented 4 years ago

@jponge : Thank you for our prompt note and yes, I'll follow that. In any case, what is the reason to not run I/O processing on a worker-verticle? Does it cause contention between the worker-pool-threads when receiving requests ? When verticle is a worker, is there an event-loop (under the cover) to dispatch to worker-pool-thread?

vietj commented 4 years ago

yes there is such dispatch done from the event-loop to a worker pool.

On Tue, Sep 8, 2020 at 11:17 PM msonhub notifications@github.com wrote:

@jponge https://github.com/jponge : Thank you for our prompt note and yes, I'll follow that. In any case, what is the reason to not run I/O processing on a worker-verticle? Does it cause contention between the worker-pool-threads when receiving requests ? When verticle is a worker, is there an event-loop (under the cover) to dispatch to worker-pool-thread?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/3548#issuecomment-689141204, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCQG3NL3BACVRS3NGJLSE2NPBANCNFSM4QTAFL6A .

pantinor commented 4 years ago

Regards to the event loop and worker pool mentioned above, instead of using the message bus approach, is there a way to use the event loop threads for the IO processing, and concurrently associate worker threads to the verticle instances to offload the http messages (ie without message bus usage). I can see potentially a problem that the context is lost for the tcp stream associated to the request + response perhaps?

In your experiences, have you seen any vertx http server implementations offloading the message processing to worker threads in the verticle instances from the event loop threads? In our case, we have generally been keeping the logic processing performed by the same thread doing the IO processing. ..

vietj commented 4 years ago

You can deploy worker verticles, all request/responses will be load balanced on the worker pool.

On Wed, Sep 9, 2020 at 3:22 PM pantinor notifications@github.com wrote:

Regards to the event loop and worker pool mentioned above, instead of using the message bus approach, is there a way to use the event loop threads for the IO processing, and concurrently associate worker threads to the verticle instances to offload the http messages (ie without message bus usage). I can see potentially a problem that the context is lost for the tcp stream associated to the request + response perhaps?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/3548#issuecomment-689559832, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCWGRBNKZJQXXLLK753SE56RJANCNFSM4QTAFL6A .

pantinor commented 4 years ago

OK thanks. I saw the HTTP2 support for worker threads is added as an enhancement to the 4.0.0 milestone builds, and is not supported in 3.9.x. We will try the upgrade to 4.0.0. Thanks.

vietj commented 4 years ago

indeed in 4.0 we can support worker for HTTP/2, we might also consider supporting dispatching to multiple event loops in 4.x (which is your use case) for HTTP/2 (perhaps for HTTP/1 too)

On Thu, Sep 10, 2020 at 2:50 PM pantinor notifications@github.com wrote:

OK thanks. I saw the HTTP2 support for worker threads is added as an enhancement to the 4.0.0 milestone builds https://github.com/eclipse-vertx/vert.x/issues/3216, and is not supported in 3.9.x. We will try the upgrade to 4.0.0. Thanks.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/eclipse-vertx/vert.x/issues/3548#issuecomment-690263371, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXDCSEJEIKIYXKIZLOARDSFDDS3ANCNFSM4QTAFL6A .

pantinor commented 4 years ago

@vietj with the 4.0.0 code, I tried offloading to the worker verticles from the event loop using the executeBlocking function like below. It seems to do what we are trying to do. Do you any problem here? The only issue I saw was that the worker pool size was 20 threads instead of the 5 that I indicated on the verticle deployment for the event loop threads. Also, can this below work on 3.9.x release too? I could try it and let you know.

` router.putWithRegex("/nnrf-nfm/.*").handler(routingContext -> { vertx.executeBlocking(fut -> {

    System.out.printf("Processing on thread %s\n", Thread.currentThread().getName());

    try {
        Thread.sleep(10);//some logic processing that take awhile
    } catch (Exception e) {
    }

    HttpServerResponse response = routingContext.response();
    response.setStatusCode(200);
    response.end();

    fut.complete();
}, false, res -> {
    //done
});

});

`

eclipse-vertx / vert.x