eclipse-ee4j / grizzly

Grizzly
https://eclipse-ee4j.github.io/grizzly
Other
147 stars 68 forks source link

Introduce VirtualThreadExecutorService for Virtual Threads Support #2185

Open mz1999 opened 1 year ago

mz1999 commented 1 year ago

This PR introduces the VirtualThreadExecutorService, extending the capabilities of Grizzly's thread pool to support Java's virtual threads. This new executor service leverages the lightweight, user-mode threads introduced in recent Java versions, offering potential performance improvements for certain workloads.

grizzly has four IO Strategies, and each IOStrategy will involve two types of thread pools:

VirtualThreadExecutorService is more suitable for worker-thread IOStrategy scenarios where Selector threads use Platform threads and the worker ThreadPool uses virtual threads. workerthread-strategy

Here's how you create a worker ThreadPool for grizzly in glassfish:

// Use standard Grizzly thread pool
workerExecutorService = GrizzlyExecutorService.createInstance(
        configureThreadPoolConfig(networkListener, threadPool));
transport.setWorkerThreadPool(workerExecutorService);

Similarly, the GrizzlyExecutorService can be replaced with VirtualThreadExecutorService so that the worker thread uses a virtual thread.

workerExecutorService = VirtualThreadExecutorService.createInstance(
        configureThreadPoolConfig(networkListener, threadPool));
transport.setWorkerThreadPool(workerExecutorService);
kofemann commented 1 year ago

Though this is a good move I have two concerns. First, Java 21 is not LTS, and probably is not a target for grizzly-based applications. The second, and more serious one, is that grizzly internally uses a lot of ThreadLocal variables. Thus the impact of switching to VirtualThread is unclear. So, probably updating the code base to use ScopedValue should be done prior to using VirtualThread.

Luckily, grizzly allows easily to use of custom executor service as a worker thread pool. This allows us to play with VirtualThreads without making it a strong dependency right away.

kofemann commented 1 year ago

Ok, I have to correct here myself, as 21 is of course expected to be LTS. However the second concern is still valid.

arjantijms commented 1 year ago

21 is of course expected to be LTS.

Just a small nit, but Java 21 itself is certainly not an LTS. Core Java SE doesn't have any concept of LTS. The distribution offered by Oracle may be LTS.

mnriem commented 12 months ago

@kofemann Can you make it so the build can use Java 11 and exclude these changes as well as Java 21 ea and include these changes.

OndroMih commented 10 months ago

Hi, @kofemann,

Though this is a good move I have two concerns. ~First, Java 21 is not LTS, and probably is not a target for grizzly-based applications.~ The second, and more serious one, is that grizzly internally uses a lot of ThreadLocal variables. Thus the impact of switching to VirtualThread is unclear. So, probably updating the code base to use ScopedValue should be done prior to using VirtualThread.

The impact of using ThreadLocal in virtual threads is only if it's used like a cache for threads in a thread pool. Meaning that it caches data created by one task and used by subsequent tasks executed on the same thread later. Otherwise it effectively works like a ScopedValue within the scope a virtual thread, and thus within the scope of a single task executed on the thread. Switching to virtual threads might incur a performance penalty because objects previously cached for all tasks on the same thread would be cached only for a single task and then thrown away. But all should work, with this possibly negligible penalty. If the penalty is not negligible, ScopedValue wouldn't help. It would be necessary to make the cache global and thread-safe.

OndroMih commented 10 months ago

@mz1999 Your solution is very similar to what I experimented with here: https://github.com/OmniFish-EE/glassfish-grizzly-virtual-threads-pool

I was also thinking of adding some limitation on the maximum virtual threads running at a time, similar to maxPoolSize for a platform thread pool. This would allow to limit the load on the server and allow for back pressure, slowing down communication from clients. Greg Wilkins from Webtide writes about this in this article https://webtide.com/if-virtual-threads-are-the-solution-what-is-the-problem/ (in The Cost of Waiting section). I think it would be enough to add a semaphore in the execute method and allow only X parallel executions of the internalExecutorService.execute(command) method. If the maximum number of threads is reached, it would block the kernel thread but that's OK - that's exactly the moment when clients need to wait to write data to a connection, which slows them down and allows for back pressure.

What do you think?

mz1999 commented 10 months ago

@mz1999 Your solution is very similar to what I experimented with here: https://github.com/OmniFish-EE/glassfish-grizzly-virtual-threads-pool

I was also thinking of adding some limitation on the maximum virtual threads running at a time, similar to maxPoolSize for a platform thread pool. This would allow to limit the load on the server and allow for back pressure, slowing down communication from clients. Greg Wilkins from Webtide writes about this in this article https://webtide.com/if-virtual-threads-are-the-solution-what-is-the-problem/ (in The Cost of Waiting section). I think it would be enough to add a semaphore in the execute method and allow only X parallel executions of the internalExecutorService.execute(command) method. If the maximum number of threads is reached, it would block the kernel thread but that's OK - that's exactly the moment when clients need to wait to write data to a connection, which slows them down and allows for back pressure.

What do you think?

I completely agree with you. Although virtual threads are very lightweight and can be easily created in large numbers, request processing is not just about creating virtual threads, it is also about consuming system resources (CPU, memory, I/O, etc.) or external resources (e.g. databases) while executing the processing tasks. As the number of concurrently executing tasks increases, there comes a point where one or more resources become bottlenecks, leaving additional virtual threads idle while waiting for access to those resources. Therefore, imposing a limit on the maximum number of concurrent virtual threads is indeed a prudent strategy.

To address this, I've introduced a semaphore to control the maximum number of concurrent tasks. When this limit is exceeded, a RejectedExecutionException is thrown, signaling to the client that the executor has reached saturation and cannot accept any more tasks.

kofemann commented 10 months ago

Generally, I have nothing against VirtualThread and we are looking in our project to make use of them as well. Nonetheless, I am less optimistic about using them in grizzly, which internally uses ThreadLocals in some places:

$ git grep -n "new ThreadLocal"                                                                                                                            
modules/comet/src/main/java/org/glassfish/grizzly/comet/CometContext.java:101:    protected final static ThreadLocal<Request> REQUEST_LOCAL = new ThreadLocal<>();
modules/grizzly/src/main/java/org/glassfish/grizzly/ThreadCache.java:36:    private static final ThreadLocal<ObjectCache> genericCacheAttr = new ThreadLocal<>();
modules/grizzly/src/main/java/org/glassfish/grizzly/Writer.java:145:        private static final ThreadLocal<Reentrant> REENTRANTS_COUNTER = new ThreadLocal<Reentrant>() {
modules/grizzly/src/main/java/org/glassfish/grizzly/threadpool/Threads.java:25:    private static final ThreadLocal<Boolean> SERVICE_THREAD = new ThreadLocal<>();
modules/grizzly/src/test/java/org/glassfish/grizzly/AsyncWriteQueueTest.java:456:            final ThreadLocal<Integer> reentrantsCounter = new ThreadLocal<Integer>() {
modules/http-server/src/test/java/org/glassfish/grizzly/http/server/NIOOutputSinksTest.java:783:            ThreadLocal<Integer> reentrantsCounter = new ThreadLocal<Integer>() {
modules/http-servlet/src/main/java/org/glassfish/grizzly/servlet/AsyncContextImpl.java:75:    private final ThreadLocal<Boolean> isDispatchInScope = new ThreadLocal<Boolean>() {
modules/http-servlet/src/main/java/org/glassfish/grizzly/servlet/AsyncContextImpl.java:92:    private final ThreadLocal<Boolean> isStartAsyncInScope = new ThreadLocal<Boolean>() {
modules/http-servlet/src/main/java/org/glassfish/grizzly/servlet/ServletInputStreamImpl.java:42:    private static final ThreadLocal<Boolean> IS_READY_SCOPE = new ThreadLocal<>();
modules/http-servlet/src/main/java/org/glassfish/grizzly/servlet/ServletOutputStreamImpl.java:42:    private static final ThreadLocal<Boolean> CAN_WRITE_SCOPE = new ThreadLocal<>();
modules/http-servlet/src/main/java/org/glassfish/grizzly/servlet/WebappContext.java:144:    private final ThreadLocal<DispatchData> dispatchData = new ThreadLocal<>();
modules/http/src/main/java/org/glassfish/grizzly/http/util/CookieUtils.java:73:    public static final ThreadLocal<SimpleDateFormat> OLD_COOKIE_FORMAT = new ThreadLocal<SimpleDateFormat>() {
modules/http/src/main/java/org/glassfish/grizzly/http/util/FastHttpDateFormat.java:55:    private static final ThreadLocal<SimpleDateFormatter> FORMAT = new ThreadLocal<SimpleDateFormatter>() {
modules/http2/src/test/java/org/glassfish/grizzly/http2/NIOOutputSinksTest.java:998:            ThreadLocal<Integer> reentrantsCounter = new ThreadLocal<Integer>() {

Most of them are trivial to fix. This probably has to be done before VirtualThreads is made as a standard option.

OndroMih commented 10 months ago

Generally, I have nothing against VirtualThread and we are looking in our project to make use of them as well. Nonetheless, I am less optimistic about using them in grizzly, which internally uses ThreadLocals in some places.

Most of them are trivial to fix. This probably has to be done before VirtualThreads is made as a standard option.

@kofemann, I agree with you that thread locals should be reviewed. But this new VirtualThreadExecutorService is not going to be the default option, users will have to enable it. Release notes and javadoc can mention that this feature is experimental and should be used with caution. Or do you suggest that Grizzly shouldn't provide even any optional support for virtual threads until all threadlocals are addressed?

mnriem commented 8 months ago

@mz1999 @OndroMih What is needed to get this across the finish line?

OndroMih commented 7 months ago

Although I'm not a committer, this looks good to me now.

However, I think it's not OK that this PR moves the Java baseline to Java 21. All future versions of Grizzly would require Java 21 to run. I think it's better to move the VT code to a separate maven module which is built with Java 21 as the baseline, and the core module is built with Java 21 but with Java 11 as the source and target version. Then it would still be possible to use Grizzly with Java 11 with the default executor.

On top of this, there are 2 outstanding things to address but they could be addressed later:

Until these 2 are addressed, this executor can be released as an experimental executor for virtual threads. After they are addressed, it can be a generally available VT executor, and even might become the default one for Java 21 some day.

More details about ThreadLocals

Acording to the previous comment https://github.com/eclipse-ee4j/grizzly/pull/2185#issuecomment-1787016453, there are a few ThreadLocal variables used as cache:

Thread cache:

Some variables potentially used for caching - if not used for caching, may remain as is:

Thread-unsafe formatters cached for performance purposes - these should be replaced by global DateTimeFormatter instances, which is thread-safe and can be reused by multiple threads: