Loom: Impact of large J9VMThread thread list

tajila commented 2 years ago

Loom proposes to introduce virtual threads to the JDK. Our current implementation is going to implement Virtual threads with a J9VMthread (except this J9VMthread will not be backed by a J9Thread/native thread).

The impact of this design is that there will be many more J9VMthreads with Loom enabled in comparison to before. We can expect to have >10000s of J9VMthreads. The impact of this may be noticeable in operations that walk the j9vmthread list. Most notably release/acquireExclusiveVMAccess.

Are there ways to mitigate this problem?

Are there other concerns from a GC perspective?

tajila commented 2 years ago

@gacholio

tajila commented 2 years ago

Would it be possible to remove threads that are unmounted (AKA de-scheduled) from the vmthread list? These threads would not have VM access and they cannot be in native code (EDIT: the exception is VirtualThread.park) or in a snchronized block. The result would be that we would only have as many virtual threads in the threadlist as there are carrier threads (typically equal to the number of CPUs). The drawback is that it make mounting/unmounting a virtual thread slower as we would need to acquire a lock.

tajila commented 2 years ago

FYI @fengxue-IS

gacholio commented 2 years ago

Do unmounted threads have a java stack associated with them?

tajila commented 2 years ago

Yes, an unmounted thread still has a java stack and it must be walkable.

gacholio commented 2 years ago

Then I'm not sure how we could avoid walking them. What query/operation do you think doesn't need to consider them?

gacholio commented 2 years ago

It would be a simple enough matter to maintain two lists and move threads between them as they are mounted/unmounted if you can come up with a scenario.

tajila commented 2 years ago

What query/operation do you think doesn't need to consider them?

I was hoping that acquiring exclusive vmacces could avoid them. I believe thats the most common operation that walks the thread list

gacholio commented 2 years ago

True, but it's most commonly used by the GC which will need to walk all of the threads.

It would be relatively simple for the GC to walk two lists for roots, so we could speed up the exclusive VM access portion.

tajila commented 2 years ago

It would be relatively simple for the GC to walk two lists for roots, so we could speed up the exclusive VM access portion.

Okay, that sounds promising. Thoughts @dmitripivkine @amicic ?

I was also thinking there could be additional benefits for splitting the thread list.

One list would be the active threads (list 1), the other would be de-scheduled VirtualThreads (list2). List2 would be much larger than list1. We could also add a counter on threads in list2 to say how many GCs they have been descheduled for (eg. thead X has been inactive for the last 5 GCs). Could this be used to minimize the work that has to be done? In other words, if a thread has be de-scheduled for quite some time, maybe it doesn't need to be scanned as often?

gacholio commented 2 years ago

The GC could only avoid scanning if it knew for instance that the java stack pointed to no new space objects, so a scavenge could ignore it. Perhaps a third list?

dmitripivkine commented 2 years ago

First of all Global GC must scan all threads to discover roots. Yes, in Gencon we are interesting to roots in Nursery. However I think it might be hard to maintain "thread stack does not have roots in Nursery"). And it might be really complicate for Balanced where any region might be part of collection set for Partial GC

tajila commented 2 years ago

@dmitripivkine Do you see benefit to splitting the threadlist at all? Do you have any perf concerns with the fact that there will be many more j9vmthreads?

DanHeidinga commented 2 years ago

How will we GC the Virtual Threads if we keep them as roots in a thread list? My understanding was that a virtualThread can be GC'd once it's not referenced so it's possible to have lots of these in the heap that aren't roots.

Can we find these threads by building a list of them while walking the heap and then adding their corresponding J9VMThread to the root set after discovery? ie: not have them in the J9VMThread list ever but let the GC naturally discover them?

dmitripivkine commented 2 years ago

@dmitripivkine Do you see benefit to splitting the threadlist at all? Do you have any perf concerns with the fact that there will be many more j9vmthreads?

I think it is make sense to have active and de-scheduled lists. It does help with acquiring Exclusive and possibly might help with further optimizations with GC (handling of permanent set of roots)

dmitripivkine commented 2 years ago

How will we GC the Virtual Threads if we keep them as roots in a thread list? My understanding was that a virtualThread can be GC'd once it's not referenced so it's possible to have lots of these in the heap that aren't roots.

Can we find these threads by building a list of them while walking the heap and then adding their corresponding J9VMThread to the root set after discovery? ie: not have them in the J9VMThread list ever but let the GC naturally discover them?

GC-ing Virtual Threads is a new concept for me. I don't know how Virtual Threads management is going to be organized and can not estimate cost of this. If it supposes to be another clearable table I need to know details. Also this is not going to be pure clearable (remove from table if dead) but discovering Virtual Thread alive will require scan it. This might introduce new roots obviously. Keeping in mind large size of the table it might increase GC pauses significantly. Also adding such significant part to all supported collectors will require time. I am not sure it can be done for Java 19 GA

amicic commented 2 years ago

Every J9VMThread has a GC thread local struct (EnvironmentBase in OMR, and EnvironmentDelegate in OpenJ9). Don't know the exact sizes but OMR is not a small struct, mostly dominated by various GC stats structs. Probably couple of KBs.

If we associate J9VMThread to each virtual thread, so we will associate Env with each of them, too. Footprint could be an issue.

These structures have more things in them associated with GC threads (lots of things, no need to list them here) than with mutator threads (for example, allocation and barrier related). Since in Gencon, any mutator thread can become a GC thread (the main thread to be more specific), we have Env struct associated with each of them.

But we probably need only one Env per OS carrier thread for GC purposes (only really one of virtual threads within that carrier thread can be involved in a GC, as a GC thread). But for mutator purposes we might still need an Env for each virtual thread - not sure, have to check what exactly is mutator specific.

If so, we would need to split Env into GC and mutator components, and associate GC componenet only with carrier thread.

(Possibly, this separation is something we could somewhat benefit even without virtual threads (with tangible savings in a presence of many mutators), since really only one mutator thread can be the main GC thread at a given GC - so we really need only EnvGC component for all mutators, not for each mutator.)

Edit: obviously, all worker GC threads need Env GC componenet, although they seem to need mutator one as well, since early after creation they end up allocating some java thread object.

tajila commented 2 years ago

Closing as we have changed the design approach

eclipse-openj9 / openj9

Loom: Impact of large J9VMThread thread list #15057