Benchmark of VirtualThread

giuliopulina commented 6 months ago

I'm considering investigating how VirtualThread performs in this benchmark. Would you find that interesting? If so, I could work on implementing it and submit a pull request

lucapiccinelli commented 6 months ago

Yes, it could be interesting. Thank you

giuliopulina commented 6 months ago

Hi @lucapiccinelli,

I did a bit of work on my fork of this project.

Almost immediately, while trying to run the existing Spring project with one thread per request, I had an issue: the application was constantly throwing OOM errors after a very few time. Looking at VisualVM, I discovered that, for each Websocket connection, a huge amount of memory (40MB!) was retained in a buffer and the GC was not able to free it, because it was referenced in Tomcat session:

visualvm2

After some analysis, I discovered that the cause of the issue was the call to

    @Override
    public void afterConnectionEstablished(WebSocketSession session) {
        session.setTextMessageSizeLimit(20000000);
    }

in ServerWebSocketHandler.

It seems that the code above make java.nio.HeapCharBuffer preallocate a char[array] with the max size of the text message, that is 2 byte * 20000000 = 40MB By just removing this value setting, I could run the test without any issues and with an error rate of 0%. By the way, the issue is happening only on Tomcat, so it was not happening in Spring Webflux project. This said, I think that this problem could have affected the published results of the benchmark.

Next activities performed:

upgraded the dependencies on all Spring projects to use java 21 and latest version of Spring (3.3.0-M1), except Spring Webflux that I left as-is
upgraded the dependencies and solved some issues in Ktor project (for example: disabled server ping because not working for me due to timeout, but could not figure out the reason)
removed hardcoded configuration in Jmeter file and created a configuration section
worked on JMeter file to, eventually, include the following scenarios (for all of them, I have slightly increased threads' ramp-up time, to solve some issues with Ktor that was not able to create all the connections at once. Also I have reduced duration to 10 minutes):
- Ktor (hard)
- Spring webflux (hard)
- Tomcat servlet stack (easy)
- Tomcat servlet stack (hard)
- Tomcat servlet stack with virtual threads (easy)
- Tomcat servlet stack with virtual threads (hard)
- Jetty servlet stack (easy)
- Jetty servlet stack (hard)
- Jetty servlet stack with virtual threads (easy)
- Jetty servlet stack with virtual threads (hard)

Focusing just on the 'hard' tests, I surprisingly found out that Servlet stack outperforms both Ktor and Spring Webflux. Jetty and Tomcat have similar throughput values, while Ktor and Webflux throughput values are much lower.

Tomcat servlet stack (hard)
Jetty servlet stack (hard)
Ktor (hard)
Spring webflux (hard)

Furthermore, it seems that introducing virtual threads doesn't make any relevant difference.

Tomcat servlet stack with virtual threads (hard)
Jetty servlet stack with virtual threads (hard)

What do you think about the above results?

Thanks, Giulio

lucapiccinelli commented 6 months ago

Hello @giuliopulina. Thank you for your investigations. Your findings are really interesting and valuable under many perspective for us. Unfortunately, I can't swear that we are going to check and fix the article in the short term, because blogging and open sourcing Is just an experiment for us, at the moment. Then we still struggle in allocating resources. But this kind of contributions Is exactly what we aimed for, by publishing these contents. Then really thank you.

For the moment I will involve @mattrcc95 into the discussion. He did the benchmark, then I think that he will be interested in having a look at your work.

giuliopulina commented 6 months ago

Hello @giuliopulina. Thank you for your investigations. Your findings are really interesting and valuable under many perspective for us. Unfortunately, I can't swear that we are going to check and fix the article in the short term, because blogging and open sourcing Is just an esperiment for us, at the moment. Then we still struggle in allocating resources.

This is completely understandable!

But this kind of contributions Is exactly what we aimed for, by publishing these contents. Then really thank you. For the moment I will involve @mattrcc95 into the discussion. He did the benchmark, then I think that he will be interested in having a look at your work.

Thanks, and please don't hesitate to ask clarifications about my work. I think that if the benchmark is modified by adding a blocking call before sending back the message to the websocket client (simulating a DB query, for example):

 public void handleTextMessage(WebSocketSession callerSession, @NotNull TextMessage toForward) throws 
 IOException {
    String content = toForward.getPayload();

    blockingCall();

    callerSession.sendMessage(new TextMessage(content));
}

the one-thread-per-request model would not perform so well, and, probably, performance could really improve introducing virtual threads or switching to reactive model.

Studiofarma / websocket-server-benchmark

Benchmark of VirtualThread #2