Closed durban closed 3 months ago
I've locally modified Lincheck to not filter stacktraces. This way I've got the full stacktrace of the IndexOutOfBoundsException
:
The number next to an exception name helps you find its stack trace provided after the interleaving section
---
Exception stack traces:
#1: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
at java.base/java.util.Objects.checkIndex(Objects.java:361)
at java.base/java.util.ArrayList.get(ArrayList.java:427)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.getConcurrentActorCausesBlocking(ManagedStrategy.kt:263)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.failIfObstructionFreedomIsRequired(ManagedStrategy.kt:250)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPointRegular(ManagedStrategy.kt:327)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPoint(ManagedStrategy.kt:312)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.beforeAtomicMethodCall$lincheck(ManagedStrategy.kt:537)
at org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.atomic.AtomicReference.getAcquire(AtomicReference.java:334)
at dev.tauri.choam.internal.skiplist.SkipListMap.walkRight(SkipListMap.scala:104)
at dev.tauri.choam.internal.skiplist.SkipListMap.findPredecessor(SkipListMap.scala:808)
at dev.tauri.choam.internal.skiplist.SkipListMap.doRemove(SkipListMap.scala:602)
at dev.tauri.choam.internal.skiplist.SkipListMap.remove(SkipListMap.scala:176)
at dev.tauri.choam.data.Ttrie.unsafeDelRef(Ttrie.scala:150)
at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2(Ttrie.scala:160)
at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2$adapted(Ttrie.scala:160)
at dev.tauri.choam.core.Rxn$InterpreterState.loop(Rxn.scala:1400)
at dev.tauri.choam.core.Rxn$InterpreterState.interpretWithContext(Rxn.scala:1552)
at dev.tauri.choam.core.Rxn$InterpreterState.interpret(Rxn.scala:1545)
at dev.tauri.choam.core.Rxn.unsafePerform(Rxn.scala:260)
at dev.tauri.choam.data.TtrieModelTest$AbstractTestState.lookup(TtrieModelTest.scala:71)
at org.jetbrains.kotlinx.lincheck.runner.TestThreadExecution21.run(Unknown Source)
at org.jetbrains.kotlinx.lincheck.runner.FixedActiveThreadsExecutor.testThreadRunnable$lambda$8(FixedActiveThreadsExecutor.kt:151)
at java.base/java.lang.Thread.run(Thread.java:840)
So it is indeed coming from Lincheck itself.
Hi @durban! Could you please check whether the issue still occurs with Lincheck 2.26?
I've tried 2.26, and it fails the same way. The (unfiltered) stacktrace is this:
The number next to an exception name helps you find its stack trace provided after the interleaving section
---
Exception stack traces:
#1: java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 3
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:64)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
at java.base/java.util.Objects.checkIndex(Objects.java:361)
at java.base/java.util.ArrayList.get(ArrayList.java:427)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.getConcurrentActorCausesBlocking(ManagedStrategy.kt:267)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.failIfObstructionFreedomIsRequired(ManagedStrategy.kt:254)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPointRegular(ManagedStrategy.kt:330)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.newSwitchPoint(ManagedStrategy.kt:315)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.beforeAtomicMethodCall$lincheck(ManagedStrategy.kt:538)
at org.jetbrains.kotlinx.lincheck.tran$f*rmed.java.util.concurrent.atomic.AtomicReference.getAcquire(AtomicReference.java:334)
at dev.tauri.choam.internal.skiplist.SkipListMap.walkRight(SkipListMap.scala:104)
at dev.tauri.choam.internal.skiplist.SkipListMap.findPredecessor(SkipListMap.scala:808)
at dev.tauri.choam.internal.skiplist.SkipListMap.doRemove(SkipListMap.scala:602)
at dev.tauri.choam.internal.skiplist.SkipListMap.remove(SkipListMap.scala:176)
at dev.tauri.choam.data.Ttrie.unsafeDelRef(Ttrie.scala:150)
at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2(Ttrie.scala:160)
at dev.tauri.choam.data.Ttrie.$anonfun$cleanupLater$2$adapted(Ttrie.scala:160)
at dev.tauri.choam.core.Rxn$InterpreterState.loop(Rxn.scala:1400)
at dev.tauri.choam.core.Rxn$InterpreterState.interpretWithContext(Rxn.scala:1552)
at dev.tauri.choam.core.Rxn$InterpreterState.interpret(Rxn.scala:1545)
at dev.tauri.choam.core.Rxn.unsafePerform(Rxn.scala:260)
at dev.tauri.choam.data.TtrieModelTest$AbstractTestState.lookup(TtrieModelTest.scala:71)
at org.jetbrains.kotlinx.lincheck.runner.TestThreadExecution21.run(Unknown Source)
at org.jetbrains.kotlinx.lincheck.runner.FixedActiveThreadsExecutor.testThreadRunnable$lambda$8(FixedActiveThreadsExecutor.kt:151)
at java.base/java.lang.Thread.run(Thread.java:840)
So essentially the same as before, although some line numbers changed.
Hi @durban !
PR #277 should fix the IndexOutOfBoundsException
.
But even with this fix, the test still fails (I attached the log: output.txt).
As you might see, the log contains the following line:
java.lang.IllegalStateException: Non-determinism found. Probably caused by non-deterministic code (WeakHashMap, Object.hashCode, etc).
I suspect it might be caused by the usage of WeakReference
-s (there are plenty of them in the trace).
Nevertheless, it can be seen, that the first issue detected by Lincheck is an active spin lock:
= The algorithm should be non-blocking, but an active lock is detected =
This could be a sign of an actual live-lock, or a false positive bug report.
No matter what, you can effectively disable live-lock detection by setting larger value for parameter hangingDetectionThreshold
in ModelCheckingOptions
. Setting value 200
is enough to make the test pass:
def defaultModelCheckingOptions(): ModelCheckingOptions = {
....
increaseTimeout(new ModelCheckingOptions())
.addGuarantee(assumedAtomic)
.checkObstructionFreedom(true)
// this is the "fast" configuration from the Lincheck paper:
.iterations(30) // scenarios
.threads(2)
.actorsPerThread(3) // operations per thread
.invocationsPerIteration(1000) // run each scenario this much time (FIXME)
// end of "fast" configuration
.actorsBefore(2) // so that we don't work with empty data structures
.actorsAfter(1) // to have a chance of detecting inconsistent state left
.hangingDetectionThreshold(200)
}
If you think this live-lock is false positive, please report back so we can investigate further. The hack with disabling lock detection should make the test green for now.
For weak references support, we added a new tracking issue #279
Thanks for looking into this! I've looked at the output.txt you've attached, and some things are not clear to me:
The Non-determinism found
: I'm definitely using weakrefs, but they shouldn't be externally visible. I thought that model checking inspects the results of the @Operation
s. Does it also check the internals of a data structure? Also, if you look at the original output above, it contains the operation results, e.g., removeKey(ez): false
. In the output.txt, there are no results, e.g., lookup(zE)
. Is this normal?
There is also this part in the log:
java.lang.IllegalArgumentException: Internal error
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy$ReplayModeLoopDetectorHelper.onNextExecution(ManagedStrategy.kt:1197)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy$LoopDetector.onNextExecutionPoint(ManagedStrategy.kt:1051)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy$LoopDetector.onThreadFinish(ManagedStrategy.kt:1010)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategy.onFinish(ManagedStrategy.kt:379)
at org.jetbrains.kotlinx.lincheck.strategy.managed.ManagedStrategyRunner.onFinish(ManagedStrategy.kt:1251)
at org.jetbrains.kotlinx.lincheck.runner.TestThreadExecution26.run(Unknown Source)
at org.jetbrains.kotlinx.lincheck.runner.FixedActiveThreadsExecutor.testThreadRunnable$lambda$8(FixedActiveThreadsExecutor.kt:151)
at java.base/java.lang.Thread.run(Thread.java:829)
This looks like it shouldn't happen, regardless of the behavior of the tested code. Could this be the reason for the Non-determinism found
also?
Finally about the active lock is detected
: there should be no spinlocks, but I'll look into it. For the hangingDetectionThreshold
, what does the value 200
mean?
The Non-determinism found: I'm definitely using weakrefs, but they shouldn't be externally visible. I thought that model checking inspects the results of the @Operations. Does it also check the internals of a data structure?
Model checking mode generally relies on the fact that it can deterministically reproduce any execution of the program. It stores some data structure internally, that allows to do that, and the bytecode transformations ensure that we track all the events in the program that may cause non-determinism (like calls to Random
, thread switches, etc).
But for now we do not support WeakReference
-s. Their methods are non-deterministic too (because their results depend on GC). We will fix this problem later under #279.
Also, if you look at the original output above, it contains the operation results, e.g., removeKey(ez): false. In the output.txt, there are no results, e.g., lookup(zE). Is this normal?
What happens with your test, is that the model checker discovers what it thinks is an active lock. In general, when the MC discovers some bug, it attempts to run the same interleaving leading to a bug yet another time -- but this time collecting the program trace on the road. Because of non-determinism it cannot replay the same execution, and thus the second time it obtains different results.
For the hangingDetectionThreshold, what does the value 200 mean?
At least for now, the active lock detector suspects that program entered active lock whenever same operation hits specific code location N times (it usually happens when the code spins in a loop). The hangingDetectionThreshold
sets this N parameter.
Thanks, that makes a lot of things much more clear to me. I'll try to look into what could be this spinlock (or spinlock-like thing).
Hi @durban !
For your test, Lincheck actually reported false positive active lock due to a bug. PR #281 fixes this bug. Thanks to @ndkoval for spotting the root cause of the bug.
Thanks, that's good to hear! I'll recheck when the fix is released.
The new release lincheck-2.27 should fix this issue. @durban please re-open it if the problem persists.
I think an
IndexOutOfBoundsException
might be somehow thrown by Lincheck. I see in the code, that stacktraces are filtered; maybe that hides the exact source. EDIT: It is indeed thrown by Linchek, see below.Steps to reproduce (unfortunately I don't know how to minimize):
The
IndexOutOfBoundsException
looks like this:Now, this doesn't make a lot of sense, I suspect due to the stacktrace filtering. To the best of my knowledge, my code doesn't use
ArrayList
anywhere.For reference, more output: