Closed ben-manes closed 4 months ago
Interestingly this passed in 20-ea
https://github.com/ben-manes/caffeine/actions/runs/3529461192/jobs/5920532925
I profiled and found that it is throwing NoClassDefFoundError
and I presume retrying indefinitely. The reason appears to be a failed transform,
Could not initialize class org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.ForkJoinPool
void java.lang.Error.
(String) void java.lang.LinkageError. (String) void java.lang.NoClassDefFoundError. (String) void com.github.benmanes.caffeine.cache.BoundedLocalCache.performCleanUp(Runnable) void com.github.benmanes.caffeine.cache.BoundedLocalCache$PerformCleanupTask.run() void com.github.benmanes.caffeine.lincheck.AbstractLincheckCacheTest$$Lambda$3254+0x00000008029f7140.74472028.execute(Runnable)
If I remove the offending code then the test passes. In this case it is an optimization to more aggressively resubmit onto the common pool if work remains, else wait for the next triggering action. I don't know why other JDK versions wouldn't suffer this problem and only on jdk19 with 2.16 does it break.
/**
* Performs the maintenance work, blocking until the lock is acquired.
*
* @param task an additional pending task to run, or {@code null} if not present
*/
void performCleanUp(@Nullable Runnable task) {
evictionLock.lock();
try {
maintenance(task);
} finally {
evictionLock.unlock();
}
if ((drainStatusOpaque() == REQUIRED) && (executor == ForkJoinPool.commonPool())) {
scheduleDrainBuffers();
}
}
Oh, I missed this initialization exception for the static load,
Exception java.lang.IllegalAccessError: class org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.ForkJoinPool (in unnamed module @0x1add17cc) cannot access class jdk.internal.vm.SharedThreadContainer (in module java.base) because module java.base does not export jdk.internal.vm to unnamed module @0x1add17cc [in thread "FixedActiveThreadsExecutor@849727439-1"] 1
I can reproduce eagerly to fail the test by adding ForkJoinPool.commonPool()
to the class constructor, which leads to opening jdk.internal.vm
and jdk.internal.access
so that the tests pass.
Hi @ben-manes !
So, in short, it seems the error is caused by lazy static initialization in concurrent threads leading to a deadlock. Interesting. This behaviour is likely due to cyclic class dependencies, though it is yet unclear why exactly this happens
Hi @alefedor
The summary is that ForkJoinPool
now imports classes from jdk.internal.access
and jdk.internal.vm
. This is likely something to do with virtual threads. When asm rewrites to a shaded instance the module system restricts access to these classes, which fails the initialization of the common pool. As that is a static instance this happens during classloading, the entire ForkJoinPool
class fails to be loaded and a linkage error is thrown. Unfortunately, Lincheck swallows all throwable exceptions blindly, whereas Error
types should always be propagated and never handled directly. This results in the test retrying forever and not reporting what went wrong.
The important fix is to always rethrow java.lang.Error
types so that the test terminates early and reports the failure, as this exception type is meant to indicate that the JVM is in an unrecoverable state. Then if additional modules are required in future runtimes, users can quickly diagnose and make the necessary changes.
The user change is to include add-opens directives for jdk.internal.vm
and jdk.internal.access
. As ForkJoinPool is listed in your transformations, it seems that this should be in your README for Java 9+. This is less pressing as merely outdated documentation.
The problem should be resolved with the changes for #136. I've also created an issue about propagating Error
s (#258).
Hi, @ben-manes! It has taken a while to address the issue, but the recent 2.30 release should've fixed it. Could you please check?
oh yes, this is fixed. I am on 2.29 now as I ran into a bug where Lincheck self terminates when its hung,
Gradle suite > Gradle test > com.github.benmanes.caffeine.lincheck.CaffeineLincheckTest$BoundedLincheckTest > modelCheckingTest FAILED
org.jetbrains.kotlinx.lincheck.LincheckAssertionError:
= The execution has hung, see the thread dump =
I reported it under https://github.com/JetBrains/lincheck/issues/311 but my simplification was a misdirect, so I'll update that issue title.
When upgrading from 2.15, a previously successful test hangs and times out on CI. I was able to reproduce this locally on jdk19 (both 11 and 17 passed). Using jps for the pid and jstack for the stacktrace, I could see that it was always doing work within lincheck methods. Sorry that I cannot provide more insights.
https://github.com/ben-manes/caffeine/actions/runs/3528817208/jobs/5919342189