Open JonathanGiles opened 8 months ago
Wrk is a great tool, I had used it before, We can also use JMH for components like TeenyJson. Not sure if we need a script like profile.sh
These are some results so far.
Same command used in java-httpserver-vthreads
Cached Thread Pool
alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products Running 1m test @ http://localhost:8080/store/products 8 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.92ms 2.45ms 50.74ms 93.64% Req/Sec 230.37 566.98 3.00k 87.19% Latency Distribution 50% 1.30ms 75% 1.64ms 90% 2.92ms 99% 14.55ms 9274 requests in 1.00m, 1.80MB read Socket errors: connect 0, read 10039, write 0, timeout 0 Requests/sec: 154.30 Transfer/sec: 30.59KB
Work Stealing pool
alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products Running 1m test @ http://localhost:8080/store/products 8 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 3.81ms 6.20ms 50.30ms 95.10% Req/Sec 70.03 292.76 2.17k 95.34% Latency Distribution 50% 2.05ms 75% 3.61ms 90% 6.00ms 99% 42.87ms 5393 requests in 1.00m, 1.04MB read Socket errors: connect 0, read 5984, write 0, timeout 0 Requests/sec: 89.73 Transfer/sec: 17.79KB
Virtual Threads
alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products Running 1m test @ http://localhost:8080/store/products 8 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 2.96ms 6.78ms 103.99ms 97.99% Req/Sec 93.91 374.23 2.52k 94.59% Latency Distribution 50% 1.61ms 75% 2.77ms 90% 5.25ms 99% 13.76ms 7164 requests in 1.00m, 1.39MB read Socket errors: connect 0, read 7509, write 0, timeout 0 Requests/sec: 119.23 Transfer/sec: 23.64KB
Environment
👉 Wrk Tool
We should have in mind this when working with virtual threads:
The monopolization has been explained in the Virtual threads are useful for I/O-bound workloads only section. When running long computations, we do not allow the JVM to unmount and switch to another virtual thread until the virtual thread terminates. Indeed, the current scheduler does not support preempting tasks.
This monopolization can lead to the creation of new carrier threads to execute other virtual threads. Creating carrier threads results in creating platform threads. So, there is a memory cost associated with this creation.
Ready to upgrade to java 21? 😬
I would like Teeny to have basic prometheus metrics
http_server_requests_seconds_sum{application="teeny",error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/store/pets",} 1.950221315
Results against TestServer.java
alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost/user/1/details
Running 1m test @ http://localhost/user/1/details
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.37ms 1.10ms 33.87ms 97.64%
Req/Sec 2.00k 1.49k 4.88k 50.63%
Latency Distribution
50% 1.20ms
75% 1.36ms
90% 1.80ms
99% 4.15ms
32277 requests in 1.00m, 3.91MB read
Socket errors: connect 0, read 32923, write 0, timeout 0
Requests/sec: 537.07
Transfer/sec: 66.61KB
Plenty of scope for perf gains then! :)
Plenty of scope for perf gains then! :)
Just profiled Teeny (IJ profiler) and looks like there's a death-lock, everything performs well until ~30th concurrent request, made a few adjustments and then got ~850 req/sec, this will be fun 😀
I really wanted at some point to get virtual threads integrated on certain levels of jdk, etc. It would be interesting to find other areas to improve.
I've been testing different strategies to make teeny more performant, I reached 1K/sec, one of them IMO the most important component to process more requests is the ThreadPoolExecutor, I just looked at this ThreadPoolExecutor.java and this
thoughts?
Let's get the perf up!
Just implemented JMH at TeenyJson branch, these are the results so far
Benchmark Mode Cnt Score Error Units
JsonBenchmarks.decodingBenchmark thrpt 5 30337.625 ± 60.575 ops/s
JsonBenchmarks.encodingBenchmark thrpt 5 199324.808 ± 6876.177 ops/s
Ups, Looks like if you run Teeny from IntelliJ, teeny will underperform.
These results are from running teeny from the cli
java -jar target/teenyhttpd-1.0.6.jar
alex@Alexs-MacBook-Pro ~ % wrk -d 5s http://localhost/health
Running 5s test @ http://localhost/health
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 7.41ms 40.40ms 352.74ms 96.50%
Req/Sec 8.50k 3.16k 11.05k 88.89%
16380 requests in 5.04s, 1.84MB read
Socket errors: connect 0, read 16380, write 0, timeout 0
Requests/sec: 3252.73
Transfer/sec: 374.82KB
Teeny update:
alex@Alexs-MacBook-Pro ~ % ./bombardier -d 5s http://localhost:8080
Bombarding http://localhost:8080 for 5s using 125 connection(s)
[=========================================================================================================================================================] 5s
Done!
Statistics Avg Stdev Max
Reqs/sec 15177.85 3110.70 21408.30
Latency 8.20ms 9.08ms 218.10ms
HTTP codes:
1xx - 0, 2xx - 75834, 3xx - 0, 4xx - 0, 5xx - 0
others - 400
Errors:
the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection - 372
dial tcp [::1]:8080: connect: connection refused - 27
write tcp 127.0.0.1:53575->127.0.0.1:8080: write: broken pipe - 1
Throughput: 2.98MB/s
It would be cool to use something like this to start performance benchmarking and improving the performance of TeenyHttpd.
Thoughts @alex-cova ?