This Java 21 project benchmarks a simple Spring Boot 3.3 microservice using configurable scenarios, comparing Java Virtual Threads (introduced by Project Loom, JEP 444) using Tomcat and Netty with Spring WebFlux (relying on Project Reactor) using Netty.
All benchmark results below come from a dedicated bare metal test environment. The benchmark is also scheduled to run monthly on GitHub-hosted runners, using all combinations of (Ubuntu 22.04, Ubuntu 24.04) and (Java 21, Java 23).
Both Spring WebFlux and Virtual Threads are alternative technologies to create Java microservices that support a high number of concurrent users, mapping all incoming requests to very few shared operating system threads. This reduces the resource overhead incurred by dedicating a single operating system thread to each user.
Spring WebFlux was first introduced in September 2017. Virtual Threads were first introduced as preview feature with Java 19 and were fully rolled out with Java 21 in September 2023.
[!NOTE] In a nutshell, the benchmark results are:
Virtual Threads on Netty (using blocking code) showed very similar and often superior performance characteristics (latency percentiles, requests per second, system load) compared with WebFlux on Netty (using non-blocking code and relying on Mono and Flux from Project Reactor):
- Virtual Threads on Netty was the benchmark winner for ca. 40% more combinations of metrics and benchmark scenarios than Project Reactor on Netty.
- For all high user count scenarios, it had the lowest latency as well as the largest number of requests for the entirety of each benchmark run.
- In many cases (e.g. 60k-vus-smooth-spike-get-post-movies), the 90th and 99th percentile latencies (P90 and P99) were considerably lower for Virtual Threads on Netty when compared with WebFlux on Netty.
- For both approaches, we could scale up to the same number of virtual users (and thus TCP connections) before exhausting the CPU and running into time-outs due to rejected TCP connection requests.
Virtual Threads on Tomcat are not recommended for high load:
- We saw considerably higher resource use compared with the two Netty-based approaches.
- There were many time-out errors as visualized by red dots in the charts, even when the CPU use was far below 100%. In contrast, none the Netty-based scenarios experienced any errors, even with a CPU use of 100%.
Below are top-performing approaches across all scenarios and metrics, visualizing the contents of results/scenarios-default/results.csv:
requests_ok
, requests_per_second
, or sockets
.E
.(1)
indicates the overall best approach, (2)
the runner-up, and so on. This overall ranking is also shown next to each metric value.This chart compares Project Loom (on both Tomcat and Netty) with Project Reactor (on Netty).
This chart is based on same benchmark as before, but only considers Netty-based approaches.
benchmark.sh
.src/main/resources/scenarios/scenario.csv
.The benchmark is driven by k6 which repeatedly issues HTTP requests to a service listening at http://localhost:8080/
The service exposes multiple REST endpoints. The implementation of each has the same 3 stages:
$delayCallDepth > 0
, call GET /$approach/epoch-millis
recursively $delayCallDepth
times to mimic calls to upstream service(s).
Spring Boot
's WebFlux WebClient
based on Netty.WebClient
with Spring Boot's
RestClient using various client implementations. For details see the Multi-Client Scenarios chapter.$delayCallDepth = 0
, wait $delayInMillis
(default: 100
) to mimic the delay incurred by a network call, filesystem access, or similar.
platform-tomcat
, loom-tomcat
, and loom-netty
) use blocking wait whilst the reactive approach (webflux-netty
) uses non-blocking wait.Get all movies using loom-netty
approach, an HTTP call depth of 1
and a delay of 100
milliseconds:
sequenceDiagram
participant k6s
participant service
k6s->>+service: GET /loom-netty/movies?delayCallDepth=1&delayMillis=100
service->>+service: GET /loom-netty/epoch-millis?delayCallDepth=0&delayMillis=100
service->>service: Wait 100 milliseconds
service-->>-service: Return current epoch millis
service->>service: Find movies
service-->>-k6s: Return movies
The microservice under test exposes several RESTful APIs. In the following descriptions, $approach
is the approach
under test and can be one of loom-tomcat
, loom-netty
, and webflux-netty
.
All REST APIs support the following query parameters:
delayCallDepth
: Depth of recursive HTTP call stack to $approach/epoch-millis
endpoint prior to server-side delay; see Scenario Columns for more details.delayInMillis
: Server-side delay in milliseconds; see Scenario Columns for more details.The TimeController returns the milliseconds since the epoch, i.e. 1 Jan 1970:
GET /$approach/epoch-millis
The MovieController gets and saves movies which are stored in an H2 in-memory DB via Spring Data JPA, fronted by a Caffeine-backed Spring Boot cache:
DB Considerations:
loom-webflux.repo-read-only
in src/main/resources/application.yaml
.postgres
in the serverProfiles
column of the scenario CSV file. See scenarios-postgres.csv
and PostgreSQL results.Supported requests:
GET /$approach/movies?directorLastName={director}
:
{director}
values and their respective response body size in bytes, based on the default movies:
Allen
: 1597 bytes (unindented)Hitchcock
: 1579 bytes (unindented)Kubrick
: 1198 bytes (unindented)POST /$approach/movies
:
The hardware requirements depend purely on the scenarios configured in src/main/resources/scenarios/scenarios-default.csv
. The following is
recommended to run the default scenarios committed to this repo:
The following instructions assume you are using a Debian-based Linux such as Ubuntu 22.04 or 24.04.
You'll need Java 21 or above:
sudo apt install openjdk-21-jdk
k6 is used to load the service:
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6
Python 3 and matplotlib
are used to convert the CSV output of k6
and sar
/sadf
to a single PNG chart. The sar
and sadf
tools come as part of sysstat
and are used to measure resource use. To install them run:
sudo apt update && sudo apt install -y python3 python3-matplotlib sysstat
The following adjustments optimize Linux for HTTP load tests.
Ensure your system can handle a large number of concurrent connections:
printf '* soft nofile 1048576\n* hard nofile 1048576\n' | sudo tee -a /etc/security/limits.conf
Increase the port range for outgoing TCP connections and allow quick connection reuse:
printf 'net.ipv4.ip_local_port_range=1024 65535\nnet.ipv4.tcp_tw_reuse = 1\n' | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
Log out and back in.
Run a benchmark for each combination of approaches and scenarios defined in a scenario CSV file. Results are stored in
build/results/
:
./benchmark.sh
Usage as per benchmark.sh -h
:
Usage: benchmark.sh [OPTION]... [SCENARIO_FILE]
Runs benchmarks configured by a scenario file.
SCENARIO_FILE: Scenario configuration CSV file in src/main/resources/scenarios/. Default: scenarios-default.csv
OPTION:
-a <approaches> Comma-separated list of approaches to test. Default: loom-tomcat, loom-netty, webflux-netty
Supported approaches: platform-tomcat, loom-tomcat, loom-netty, webflux-netty
-C Keep CSV files used to create chart. Default: false
-h Print this help
This is a wrapper over benchmark.sh
and supports multiple scenario files:
./benchmarks.sh
Usage as per benchmarks.sh -h
:
Usage: benchmarks.sh [OPTION]... [SCENARIO_FILE]...
Wrapper over benchmark.sh that supports multiple scenario files and optionally suspends the system on completion.
SCENARIO_FILE: Zero or more space-separated scenario configuration CSV files in src/main/resources/scenarios/.
Default: scenarios-default.csv scenarios-deep-call-stack.csv scenarios-postgres.csv scenarios-sharp-spikes.csv scenarios-soaktest.csv
OPTION:
-d, --dry-run Print what would be done without actually performing it.
-k, --kill-java Kill all Java processes after each benchmark. Default: false
-o, --options "<opts>" Pass additional options to the benchmark.sh script. Run "./benchmark.sh -h" for supported options.
-s, --suspend Suspend the system upon completion of the script. Default: false
-h, --help Show this help message and exit.
Please note that the default configured scenarios may take several hours to complete.
All approaches use the same Spring Boot 3.2 version.
see src/main/resources/scenarios/scenarios-default.csv
Scenario | Domain | Description | Virtual Users (VU) | Requests per Second (RPS) | Client delay (ms) | Server delay (ms) | Delay Call Depth |
---|---|---|---|---|---|---|---|
smoketest | Time | Smoke test | 5 | 5 | 0 | 100 | 0 |
5k-vus-and-rps-get-time | Time | Constant users, constant request rate | 5,000 | 5,000 | 0 | 100 | 0 |
5k-vus-and-rps-get-movies | Movies | Constant users, constant request rate | 5,000 | 5,000 | 0 | 100 | 0 |
10k-vus-and-rps-get-movies | Movies | Constant users, constant request rate | 10,000 | 10,000 | 0 | 100 | 0 |
10k-vus-and-rps-get-movies-call-depth-1 | Movies | Constant users, constant request rate | 10,000 | 10,000 | 0 | 100 | 1 |
20k-vus-stepped-spike-get-movies | Movies | Stepped user spike | 0 - 20,000 | Depends on users and delays | 1000 - 3000 (random) | 100 | 0 |
20k-vus-smooth-spike-get-movies | Movies | Smooth user spike | 0 - 20,000 | Depends on users and delays | 1000 - 3000 (random) | 100 | 0 |
20k-vus-smooth-spike-get-post-movies | Movies | Smooth user spike | 0 - 20,000 | Depends on users and delays | 1000 - 3000 (random) | 100 | 0 |
20k-vus-smooth-spike-get-post-movies-call-depth-1 | Movies | Smooth user spike | 0 - 20,000 | Depends on users and delays | 1000 - 3000 (random) | 100 | 1 |
The scenarios examine particularly high load.
These scenarios compare both Spring Boot RestClient and WebClient implementations with each other.
All scenarios except for those tested with a webflux-netty
approach use the WebClient
or RestClient
implementation specified in the scenario name. However,
the webflux-netty
approach always uses a fully reactive approach and therefore always uses the non-blocking WebClient
.
The following clients are compared:
RestClient
based on:
WebClient
based on:
The benchmark run for each $scenario
consists of the following phases and steps:
$approach
as Spring Boot profile, using the config in src/main/resources/application.yaml
and overridden by src/main/resources/application-$approach.yaml
if defined.$scenario
.$resultType
(i.e. latency
, system
, or jvm
), create a CSV file at build/results/$scenario/$approach-$resultType.csv
.build/results/$scenario/$approach.png
-C
CLI option was specified.build.gradle
file configures the heap space to 2 GiB.src/main/resources/application.yaml
file enables HTTP/2.Each line in src/main/resources/scenarios/scenarios-default.csv configures a test scenario which is performed first for Java Virtual Threads, then for WebFlux.
scenario | k6Config | serverProfiles | delayCallDepth | delayInMillis | connections | requestsPerSecond | warmupDurationInSeconds | testDurationInSeconds |
---|---|---|---|---|---|---|---|---|
5k-vus-and-rps-get-time | get-time.js | 0 | 100 | 5000 | 5000 | 10 | 300 | |
20k-vus-smooth-spike-get-movies] | k6-20k-vus-smooth-spike-get-movies].js | postgres | 0 | 100 | 20000 | 0 | 300 |
scenario
: Name of scenario. Is printed on top of each diagram.k6Config
: Name of the K6 Config File which is assumed to be in
the config
folderserverProfiles
: Pipe-delimited Spring profiles which are also used to start and stop Docker containers. For example, specifying the value postgres|no-cache
has these effects:
postgres,no-cache
are added to the default Spring Boot profile of $approach
.src/main/docker/docker-compose-postgres.yaml
and src/main/docker/docker-compose-no-cache.yaml
(if existent)
are used to start/stop Docker containers before/after each scenario run.delayCallDepth
: Depth of recursive HTTP call stack to $approach/epoch-millis
endpoint prior to server-side delay.
0
means that the service waits for $delayInMillis
milliseconds immediately upon receiving a request.$approach/epoch-millis
with ${delayCallDepth - 1}
.$delayCallDepth
.delayInMillis
: Server-side delay of each request, in milliseconds. Mimics a delay such as invoking a DB which allow for reuse of the current platform thread.connections
: Number of TCP connections, i.e. virtual users.requestsPerSecond
: Number of requests per second across all connections. Left empty for scenarios where the number
of requests per second is organically derived based on the number of connections, the request latency, and any
explicit client-side delays.warmUpDurationInSeconds
: Duration of the warm-up iteration before the actual test. Warm-up is skipped if 0
.testDurationInSeconds
: Duration of the test iteration.The following charts show the results of each scenario, sorted by ascending scenario load.
Any lines in the client-side or error-side log files which contain the term error
(case-insensitive) are preserved. You can find them in error log files, located in the results folder alongside the generated PNG files.
Any failed requests appear both in the latency chart as red dots, as well as in the RPS chart as part of a continuous orange
line. Additionally, they leave a trace in the $approach-latency.csv
file, if preserved by running the benchmark with the -C
option:
1715728866471,0.000000,0,dial: i/o timeout,1211
. Such requests are not considered when reporting minimum latency since this could obscure the minimum latency of
successful requests.1715728861008,60001.327066,0,request timeout,1050
.This scenario aims to maintain a steady number of 5k virtual users (VUs, i.e. TCP connections) as well as 5k requests per second (RPS) across all users for 3 minutes:
Like the previous scenario, but the response body contains a JSON of movies.
For further details, please see the movies section.
Like the previous scenario, but 10 virtual users and requests per second.
Like the previous scenario, but mimics a request to an upstream service.
This scenario ramps up virtual users (and thus TCP connections) from 0 to 20k in multiple steps, then back down:
Like the previous scenario, but linear ramp-up and down.
Like the previous scenario, but instead of just getting movies, we are now additionally saving them:
For further details, please see the movies section.
Like the previous scenario, but mimics call to upstream service as explained in 10k-vus-and-rps-get-movies-call-depth-1.
[!NOTE] For
loom-netty
andwebflux-netty
, this scenario was CPU-contended on the test environment upon reaching ca. 5,000 RPS. Whilst causing no errors, it drastically increased latencies.
The following results are based on scenarios-high-load.csv
which scales up to 60k users. They were executed in a VirtualBox VM on more powerful hardware and
using a different Linux Kernel version.
Like 20k-vus-smooth-spike-get-post-movies, but scaling up to 60k users.