Integration with JVM Virtual Thread (project LOOM).

fvasco commented 1 year ago

I write this placeholder to consider the integration with JVM Virtual Thread implemented in project LOOM (currently experimental on JVM 19 and 20).

I made a simple demo project (see java and kotlin branches), CPU bounded, using both Java and Kotlin style, plus the same Kotlin code using a trivial integration with virtual threads.

In my initial test, with padding=100, Virtual Thread looks perform better than Kotlin's default dispatcher, however can be reasonable than other use case with the same experience may arrive in the future.

padding100

May be possible to integrate the Dispatchers.Virtual on JVM, or shift the Dispatcher.Default and/or Dispatcher.IO to virtual thread (when available).

Currently all useful method are preview, so it is possible propose experimental API only.

ninja- commented 1 year ago

VirtualThreadCoroutineDispatcher is an interesting implementation, but I am afraid there's a major flaw in the logic. Dispatcher API is used only for running blocking parts of the code. By using Thread.startVirtualThread, all you're doing is putting that blocking part of the code in the queue to be executed when an OS thread becomes available, and it's not much different than using a normal ExecutorService.

You're not actually using the native park/unpark feature of virtual threads that could be compared to Kotlin's own dispatcher-based execution...only doing so would provide some interesting benchmark data.

I am not sure at this point if the proper implementation can fit into Dispatcher interface, unless as some recursive monster.

A better starting point to me seems to be current runBlocking function, which uses a thread loop as well as LockSupport.park / LockSupport.unpark (internet says Java planned to add special code to LockSupport to optimally park virtual threads, but I am not sure if they did so).

axelfontaine commented 1 year ago

With Virtual Threads going GA with JDK 21 in September, now seems like an ideal time to revisit this...

fvasco commented 1 year ago

Simple adapter for use of Virtual Thread in coroutines, when available

private val virtualDispatcher: CoroutineDispatcher? by lazy {
    try {
        (Executors::class.java.getMethod("newVirtualThreadPerTaskExecutor").invoke(null) as ExecutorService)
            .asCoroutineDispatcher()
    } catch (e: UnsupportedOperationException) {
        null
    } catch (e: NoSuchMethodException) {
        null
    }
}

public val Dispatchers.Virtual: CoroutineDispatcher? get() = virtualDispatcher
public val Dispatchers.VirtualOrIO: CoroutineDispatcher get() = Virtual ?: IO

fvasco commented 12 months ago

Project Loom is on LTS, any update?

revintec commented 11 months ago

I think we should do deeper integration. the implementation fvasco showed would make the code harder to debug as stacktraces and/or breakpoints would be a mess, not utilizing virtual thread's biggest strength: VM integration for easier function coloring and debugging. no offence but this level of integration provides almost no benifits

here's my thought:

[function coloring] in JVM, we should be able to mix non-suspending calls with suspending ones, and effectively deprecate runBlocking(...), the underlying suspending function can utilize JVM's native continuation directly. this greatly simplifies both the compiler and the resulting bytecode, also making the bytecode more accessible to tools like asm and native-image(no more non-reducible loops etc)
at least on JVM, remove much of the stacktrace recover/enhance/sanitize code, to align with JVM, and greatly simplifies kotlinx.coroutine code, also making the stacktrace more accessible(no more truncated stacktraces in native exceptions, no more hard to find callsites <- this one is really killing me
optionally, deprecate much of the Dispatcher code, and use JVM ones
create virtualThread iff when using launch(...), async(...), all other operations(like withContext(...), coroutineScope(...) etc should not switch threads). this would make kotlinx.coroutine more aligned with Java, and greatly improves observability and/or debuggability

in post-loom world, kotlinx.coroutine is NOT deprecated, but simplified, while also providing the following benefits:

structured concurrency, supervisorScope etc
more accessible API interface, channel/producer etc
fine grained cancellation
KMP support

I'd like to implement the afore mentioned features, but I'm not sure they can be accepted/merged into kotlinx.coroutine, so I'm waiting for a later time to revisit

fvasco commented 11 months ago

@revintec, I agree with you, my integration example is trivial and a better work can be performed modifying the Default dispatcher. At other side, I am curious about your proposal, Loom isn't a silver bullet and each sponsored benefit can be rewritten as downside.

[function coloring] in JVM, we should be able to mix non-suspending calls with suspending ones

You should to that on in a Virtual Thread only. Use suspending methods (= blocking methods) inside an event loop isn't a good idea. How understand if a list.indexOf(item) is blocking or not? A list can be an ArrayList than a JPA list. Moreover, a list can contains URI or URL (with its infamous equals implementation).

the underlying suspending function can utilize JVM's native continuation directly

"JVM's native continuation" is available only on virtual thread, it doesn't work on MainCoroutineDispatcher, for example (even if the main thread is a virutal thread).

making the stacktrace more accessible(no more truncated stacktraces in native exceptions, no more hard to find callsites

Stacktraces are expensive, virtual thread suspending requires more ram, is slower and it has a negative impact on GC time. Providing this behaviour by default may not fit all use cases. Moreover, a code perfectly working with Kotlin coroutines, can throw StackTraceOverflow using this kind of Loom integration due longer stack traces.

revintec commented 11 months ago

Use suspending methods (= blocking methods) inside an event loop isn't a good idea. How understand if a list.indexOf(item) is blocking or not? A list can be an ArrayList than a JPA list. Moreover, a list can contains URI or URL (with its infamous equals implementation).

that is correct, however not every problem can be solved, esp. not in a one single step

users can already mixing blocking and non-blocking code, think runBlocking{...}. different users have different use cases(and not every code is newly written, they have to integrate to the vast legacy code/lib), we can't deprive users' ability to use these(though less ideal) features. we can't even technically do that(actually, under the current programming model, we can't even distinguish blocking calls from non-blocking ones, and you surely don't wanna change the programming model), and we're just making it(unnecessarily) awkward to write(think suspending Iterator/Closable etc)
not every blocking code is marked suspend, think Thread.sleep(...) and more coming Panama/JNI code. mixing blocking and non-blocking code surely is not a good idea, but sadly we're not living in an ideal world, think URL.equals(...) as you've mentioned. if you do these blocking calls in kotlinx.coroutines, it would hang the underlying thread/event-loop, thus requiring withContext(IO){...} etc, but that complicates the code and adds mental burden to users(esp. new comers). and this is precisely what virtual-thread promises to solve -- abstracts these func coloring away [TODO insert all the weird deadlock and non-intuitive scheduling issues here]

"JVM's native continuation" is available only on virtual thread, it doesn't work on MainCoroutineDispatcher, for example (even if the main thread is a virutal thread).

that is a valid point, but it's not a design problem, just a implementation detail. the MainCoroutineDispatcher doesn't have to be implemented the same as other dispatchers, it can surely(continue to) be integrated to event loops

Stacktraces are expensive, virtual thread suspending requires more ram, is slower and it has a negative impact on GC time. Providing this behaviour by default may not fit all use cases. Moreover, a code perfectly working with Kotlin coroutines, can throw StackTraceOverflow using this kind of Loom integration due longer stack traces.

that is also, a implementation detail. the StackOverflowException is not a valid point, users can tweak the stack size if necessary, virtual-thread can remove the stack size limitation, users can still cause StackOverflowException even using kotlinx.coroutines. the performance/GC argument seems valid, though not backed by evidence and benchmarks, they're (currently) hypothesis, and may surely depends on the workload/use-cases. it is a trade off, if the benefit outweighs the loss, we should at least add a way to enable it

revintec commented 11 months ago

as for virtual-threads' implementation detail, here're my thoughts, though they may never be integrated into loom:

virtual thread currently uses chunked-stack, thus it can always create a new stack-trunk, effectively remove the stack-size limitation
according to Ron's presentation, it seems virtual thread's stack copying/moving is already sufficiently fast(note: I've not tested this statement). however if this copying/moving is proving to be costly, there exists a way to switch the stack instead of copying/moving them. I've done sth. similar in C/C++, and there're implementations in Fiber(windows) and UMS. it just require more design/code/testing, so we're not seeing them being implemented currently, but we could, eventually

efemoney commented 10 months ago

Is this being looked at? It's not clear whats the status of java 21 / LOOM support is

binarrii commented 10 months ago

Looking forward to the outcome of this work!

Kotlin / kotlinx.coroutines

Integration with JVM Virtual Thread (project LOOM). #3606