Add performance benchmarking

JonathanGiles commented 8 months ago

It would be cool to use something like this to start performance benchmarking and improving the performance of TeenyHttpd.

Thoughts @alex-cova ?

alex-cova commented 8 months ago

Wrk is a great tool, I had used it before, We can also use JMH for components like TeenyJson. Not sure if we need a script like profile.sh

These are some results so far.

Same command used in java-httpserver-vthreads

Cached Thread Pool

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products Running 1m test @ http://localhost:8080/store/products 8 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.92ms 2.45ms 50.74ms 93.64% Req/Sec 230.37 566.98 3.00k 87.19% Latency Distribution 50% 1.30ms 75% 1.64ms 90% 2.92ms 99% 14.55ms 9274 requests in 1.00m, 1.80MB read Socket errors: connect 0, read 10039, write 0, timeout 0 Requests/sec: 154.30 Transfer/sec: 30.59KB

Work Stealing pool

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products Running 1m test @ http://localhost:8080/store/products 8 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.81ms 6.20ms 50.30ms 95.10% Req/Sec 70.03 292.76 2.17k 95.34% Latency Distribution 50% 2.05ms 75% 3.61ms 90% 6.00ms 99% 42.87ms 5393 requests in 1.00m, 1.04MB read Socket errors: connect 0, read 5984, write 0, timeout 0 Requests/sec: 89.73 Transfer/sec: 17.79KB

Virtual Threads

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products Running 1m test @ http://localhost:8080/store/products 8 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.96ms 6.78ms 103.99ms 97.99% Req/Sec 93.91 374.23 2.52k 94.59% Latency Distribution 50% 1.61ms 75% 2.77ms 90% 5.25ms 99% 13.76ms 7164 requests in 1.00m, 1.39MB read Socket errors: connect 0, read 7509, write 0, timeout 0 Requests/sec: 119.23 Transfer/sec: 23.64KB

Environment

Apple M1 Pro 32G
JDK Amazon Corretto 21.0.1

👉 Wrk Tool

We should have in mind this when working with virtual threads:

The monopolization has been explained in the Virtual threads are useful for I/O-bound workloads only section. When running long computations, we do not allow the JVM to unmount and switch to another virtual thread until the virtual thread terminates. Indeed, the current scheduler does not support preempting tasks.

This monopolization can lead to the creation of new carrier threads to execute other virtual threads. Creating carrier threads results in creating platform threads. So, there is a memory cost associated with this creation.

Ready to upgrade to java 21? 😬

I would like Teeny to have basic prometheus metrics

http_server_requests_seconds_sum{application="teeny",error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/store/pets",} 1.950221315

alex-cova commented 8 months ago

Results against TestServer.java

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost/user/1/details
Running 1m test @ http://localhost/user/1/details
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.37ms    1.10ms  33.87ms   97.64%
    Req/Sec     2.00k     1.49k    4.88k    50.63%
  Latency Distribution
     50%    1.20ms
     75%    1.36ms
     90%    1.80ms
     99%    4.15ms
  32277 requests in 1.00m, 3.91MB read
  Socket errors: connect 0, read 32923, write 0, timeout 0
Requests/sec:    537.07
Transfer/sec:     66.61KB

JonathanGiles commented 8 months ago

Plenty of scope for perf gains then! :)

alex-cova commented 8 months ago

Plenty of scope for perf gains then! :)

Just profiled Teeny (IJ profiler) and looks like there's a death-lock, everything performs well until ~30th concurrent request, made a few adjustments and then got ~850 req/sec, this will be fun 😀

JonathanGiles commented 8 months ago

I really wanted at some point to get virtual threads integrated on certain levels of jdk, etc. It would be interesting to find other areas to improve.

alex-cova commented 8 months ago

I've been testing different strategies to make teeny more performant, I reached 1K/sec, one of them IMO the most important component to process more requests is the ThreadPoolExecutor, I just looked at this ThreadPoolExecutor.java and this

thoughts?

JonathanGiles commented 8 months ago

Let's get the perf up!

alex-cova commented 8 months ago

Just implemented JMH at TeenyJson branch, these are the results so far

Benchmark                          Mode  Cnt       Score      Error  Units
JsonBenchmarks.decodingBenchmark  thrpt    5   30337.625 ±   60.575  ops/s
JsonBenchmarks.encodingBenchmark  thrpt    5  199324.808 ± 6876.177  ops/s

alex-cova commented 8 months ago

Ups, Looks like if you run Teeny from IntelliJ, teeny will underperform.

These results are from running teeny from the cli

java -jar target/teenyhttpd-1.0.6.jar

alex@Alexs-MacBook-Pro ~ % wrk -d 5s http://localhost/health
Running 5s test @ http://localhost/health
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.41ms   40.40ms 352.74ms   96.50%
    Req/Sec     8.50k     3.16k   11.05k    88.89%
  16380 requests in 5.04s, 1.84MB read
  Socket errors: connect 0, read 16380, write 0, timeout 0
Requests/sec:   3252.73
Transfer/sec:    374.82KB

alex-cova commented 8 months ago

Teeny update:

switched to bombardier
I've been testing a new NIO based server

alex@Alexs-MacBook-Pro ~ % ./bombardier -d 5s http://localhost:8080          
Bombarding http://localhost:8080 for 5s using 125 connection(s)
[=========================================================================================================================================================] 5s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     15177.85    3110.70   21408.30
  Latency        8.20ms     9.08ms   218.10ms
  HTTP codes:
    1xx - 0, 2xx - 75834, 3xx - 0, 4xx - 0, 5xx - 0
    others - 400
  Errors:
    the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection - 372
    dial tcp [::1]:8080: connect: connection refused - 27
    write tcp 127.0.0.1:53575->127.0.0.1:8080: write: broken pipe - 1
  Throughput:     2.98MB/s

JonathanGiles / TeenyHttpd

Add performance benchmarking #6