com-lihaoyi / cask

Cask: a Scala HTTP micro-framework. Cask makes it easy to set up a website, backend server, or REST API using Scala
https://com-lihaoyi.github.io/cask/
Other
525 stars 55 forks source link

Virtual Thread support #125

Open He-Pin opened 3 months ago

He-Pin commented 3 months ago

Quick Summary: 230.51 / S (Worker Thread)

VS

1545.18 / S (Virtual Threads)

Motivation: Fix https://github.com/com-lihaoyi/cask/issues/90

I did some analyse on cask, and found it's using undertow for now. and the undertow itself is using Object lock as a lock in the http://www.jboss.org/xnio codebase, which we can do some contribution there to remove the monitor and migrating to ReentrantLock. Which should be better than waiting next Java LTS (maybe Java 25).

So, for now, I think we can just add the Virtual Thread support in cask, and then do some contribution upstream or migrating to another network layer maybe Netty?

The current changes is quite straightfoward.

Modifcation:

  1. Update the dependecies, Mill, libraries, Scala
  2. Because I'm using Windows 11, and https://github.com/com-lihaoyi/cask/pull/124 cause some issue, so I have to rename/delete it locally for now.
  3. I fixed some hint problem to make the code looks better before starting ,mainly inside Main.scala, others can follow after this PR.
  4. Add VirtualThreadSupport which only works on the Runtime where Virtual Thread is available, otherwise, it returns null..
  5. Add NewThreadPerTaskExecutor which will start a new Virtual Thread for every Command/Runnable, the Undertow needs this.
  6. Add VirtualThreadBlockingHandler which works like BlockingHandler but delegates to NewThreadPerTaskExecutor in 5 , instead of the connections' worker executor.
  7. Add a cask.virtualThread.enabled system property with expected value true|false, and only using the Virtual Thread to run Routes if and only if both the Runtime supported and cask.virtualThread.enabled is set to true. I'm not exposing an executor or etc thing, because that's would not be the philosophy of cask, but hiding the runtime behavior behind of cask.virtualThread.enabled, as I was not knowing cask is running with undertow.
  8. Do the some testing with ab, I changed a code with Thread.sleep(1000) to simulates.
    @cask.get("/")
    def hello(): String = {
    Thread.sleep(1000) // Simulate a slow endpoint
    "Hello World! Hello World! Hello World!"
    }

Results: For the first run, I think it works just as expected, I did some assertion about the thread type Thread.isVirtual too during the testing.

Quick Summary: 230.51 / S (Worker Thread) VS 1545.18 / S (Virtual Threads)

My Laptop : i9-14900hx + 64G RAM

Result with Connection Worker Thread Java 21:

C:\Users\hepin>ab -n 10000 -c 10000 "http://localhost:8080/"
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        38 bytes

Concurrency Level:      10000
Time taken for tests:   43.381 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1740000 bytes
HTML transferred:       380000 bytes
Requests per second:    230.51 [#/sec] (mean)
Time per request:       43381.358 [ms] (mean)
Time per request:       4.338 [ms] (mean, across all concurrent requests)
Transfer rate:          39.17 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.2      0      16
Processing:  1435 19619 10914.5  19634   38933
Waiting:     1011 19598 10928.7  19604   38933
Total:       1435 19619 10914.6  19634   38933

Percentage of the requests served within a certain time (ms)
  50%  19634
  66%  25433
  75%  29280
  80%  31199
  90%  35040
  95%  36965
  98%  37949
  99%  37957
 100%  38933 (longest request)

VS

Result with Virtual Threads on Java 21

C:\Users\hepin>ab -n 10000 -c 10000 "http://localhost:8080/"
This is ApacheBench, Version 2.3 <$Revision: 1913912 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests

Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /
Document Length:        38 bytes

Concurrency Level:      10000
Time taken for tests:   6.472 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      1740000 bytes
HTML transferred:       380000 bytes
Requests per second:    1545.18 [#/sec] (mean)
Time per request:       6471.734 [ms] (mean)
Time per request:       0.647 [ms] (mean, across all concurrent requests)
Transfer rate:          262.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.2      0      16
Processing:  1576 2120 202.6   2175    2527
Waiting:     1005 1631 364.6   1517    2279
Total:       1576 2120 202.6   2175    2527

Percentage of the requests served within a certain time (ms)
  50%   2175
  66%   2255
  75%   2306
  80%   2311
  90%   2327
  95%   2337
  98%   2341
  99%   2342
 100%   2527 (longest request)

Future Actions:

  1. Add testing about the NewThreadPerTaskExecutor and testing the Routes is running inside a VT when all setup.
  2. Add benchmarks with gathing/ab or doing it with Github Action (nightly build)
  3. Add Github Actions for Java 21 and Virtual thread.
  4. Adjust the code with feedback.
lihaoyi commented 3 months ago

Left some comments on the code. Thanks for looking into this!

Some high-level things to do:

  1. Include virtual thread configuration in a dedicated example in the examples/ folder, and then similarly include it in the doc-site. That would help ensure that it remains discoverable for people who want to try it out, which I expect will be an increasing number as Java 21+ is adopted.

  2. Embed your benchmark logic into the build system somehow, so other people can reproduce it. You can download and cache the ab binary via requests.get/os.write in Mill and then define a small ScalaModule to spin up Mill and run the benchmark with and without virtual threads. This will easily let people reproduce the benchmark or make adjustments to it as necessary

  3. Once you've got your benchmarking logic setup, run the benchmarks with/without virtual threads on a few of our other examples: perhaps on minimalApplication as a tiny example, and todo as a big example, and staticFiles as a non-compute bound example. We expect the benefits to be not as dramatic as with your synthetic example using Thread.sleep, but it is still good to know whether there is any improvement, no improvement, or even regression. That would help us advise users what kind of use cases would benefit from using virtual threads and which ones won't

  4. How much of the improvement is due to virtual threads, and how much is due to the NewThreadPerTaskExecutor we began using? We should run a benchmark with NewThreadPerTaskExecutor enabled but virtual threads disabled to verify that the virtual threads are the benefit, and not just the swapping out of the Executor strategy. Does the default one also spawn a NewThreadPerTask, or is it using a fixed thread pool of some sort?

He-Pin commented 3 months ago

Great, thanks for the quic and detailed review .

I will update this after work, we do see 10~15pt performance improvement and 5pt RAM reduce at internal stress testing, that numbers vary due to different workloads, but in most case,VT is a good choice ,and Nima is using virtual thread by default now.

He-Pin commented 3 months ago

Update: As @lihaoyi once said , we need comparing the performance with and witout virtual thread, so I have to screen the executor, I tested it locally, it works well.

Even the backing scheduler of Virtual thread is not the default ForkJoinPool, it still works.

VirtualThread[#91,cask-handler-executor-virtual-thread-0]/runnable@pool-1-thread-1
VirtualThread[#93,cask-handler-executor-virtual-thread-1]/runnable@pool-1-thread-2

With this setup, I think I can continue other part of this PR.

object Compress extends cask.MainRoutes {

  protected override val handlerExecutor: Executor = Executors.newFixedThreadPool(Runtime.getRuntime.availableProcessors())

  @cask.decorators.compress
  @cask.get("/")
  def hello(): String = {
    println(Thread.currentThread())
    "Hello World! Hello World! Hello World!"
  }

  initialize()
}
lihaoyi commented 3 months ago

@He-Pin let me know when this is ready for review, looking forward to using virtual threads in cask!

He-Pin commented 3 months ago

@lihaoyi I will continue it this weekend, but currently, I need a PR [https://github.com/com-lihaoyi/cask/pull/124] get merged, because I'm using Windows 11, and I don't have a mac/linux for now.

lihaoyi commented 3 months ago

@He-Pin merged it

lihaoyi commented 2 months ago

Bump on this @He-Pin :)

He-Pin commented 2 months ago

@lihaoyi Sorry for being late, We encountered a virtual thread deadlocking at work, related to the classloader. When the virtual thread and the other platform thread(not the carrier thread) both try to load the same classes or a virtual thread is been notified but there is no carrier thread(been pinned) to run the virtual thread, then deadlock.

So we just changed the classloader, both in the JVM cpp and the lib.

But, cask is a library, So the best we can do is add some documents about this case, and a simple way is limiting the Max concurrent Virtual threads number.

I'm sorry for not updating this soon, I was a little busy at work and we had this issue and fixed it with JVM modification.

So what do you think about :

  1. add some documentation about this.
  2. add some best practices about virtual threads.
lihaoyi commented 2 months ago

@He-Pin I think documentation and best practices are fine. We do not have the power to change the JVM, so best we can do is tell people if they want to use JVM virtual threads with Cask what the best way of doing so is

lihaoyi commented 1 month ago

Bump on this. @He-Pin if you're not able to pick this up, I'll put it up again in the next set of bounties

He-Pin commented 1 month ago

Sorry for the delay, I was working with our internal JVM team to workaround the synchronization in the Classloader, and another issue I currently encounter is the https://github.com/com-lihaoyi/mill/issues/3168 on Windows.

I will continue this this week, did spend sometime to learn the toolchian.

He-Pin commented 1 month ago

Because privateLookupIn starts in Java 9, and Cask currnetly support java8, so I have to fallback to Reflect I think.

lihaoyi commented 1 month ago

Got it, if you're still on it I'll leave it to you then!

Feel free to bump the required JVM version of necessary. We dont need to support Java 8 forever, e.g. requests-scala already moved to Java 11. Java is already on 21 or 22, so even jumping to Java 17 would be reasonable