JetBrains / lincheck

Framework for testing concurrent data structures
Mozilla Public License 2.0
545 stars 31 forks source link

Fix ConflatedBroadcastChannelLincheckTest in kotlinx.coroutines #273

Closed ndkoval closed 4 months ago

ndkoval commented 4 months ago

Fixes failing ConflatedBroadcastChannelLincheckTest; see the build below https://teamcity.jetbrains.com/buildConfiguration/KotlinTools_KotlinxLincheck_IntegrationTestWithKotlinxCoroutinesDevelopLinuxOnJava17/4485333?hideTestsFromDependencies=false&hideProblemsFromDependencies=false&expandBuildDeploymentsSection=false&expandBuildTestsSection=true&expandBuildProblemsSection=true

Investigation

There were two issues.

  1. Some classes had to be excluded from transformation and loaded by the system class loader.

Solution: ignore the corresponding classes in doNotTransform(..).

  1. When a spin lock was detected, suddenInvocationResult in ManagedStrategy was set to a non-null value. After that, all further inIgnoredSection(..) calls returned true (it checked for suddenInvocationResult being non-null, I don't know why), making it impossible for other threads to notice the spin-lock detection and also finish. This resulted in an infinite spin-lock and deadlock detection. However, the latter was ignored, as suddenInvocationResult was non-null. Notably, the "hanging" thread was still calling newSwitchPoint(..) periodically (there was a veeery long spin-lock). Then, after starting the next interleaving, the suddenInvocationResult field was reset to null, and the hanging threads, that were calling newSwitchPoint(..) periodically, published events related to the previous interleaving. This caused non-deterministic behavior in the model checker.

Solution: check whether suddenInvocationResult is non-null in the beginning of the analysis, removing the corresponding check from inIgnoredSection(..).

ndkoval commented 4 months ago

This issue was quite a journey. Thanks to @eupp for the help with the investigation and the review!