JetBrains / lincheck

Framework for testing concurrent data structures
Mozilla Public License 2.0
589 stars 34 forks source link

Static class initialization code is not guaranteed to run in ignored section #419

Open eupp opened 1 month ago

eupp commented 1 month ago

From what I understand from the Lincheck codebase, we have an intent to maintain the invariant that static class initialization blocks clinit are always executed in ignored sections:

https://github.com/JetBrains/lincheck/blob/800f590ef453c3eab5c87b7264c29e580060cec4/src/jvm/main/org/jetbrains/kotlinx/lincheck/transformation/LincheckClassVisitor.kt#L92

However, it looks like that currently this invariant can be broken due to a complicated interaction of instrumented and non-instrumented code.

Consider the following example:

// instrumented class
class A {
    var x: Int = 0

    fun inc() { x++ }
}

// non-instremented class
class B {
    companion object {
        @JvmField
        val a = A().apply { inc() }
    }

    // <B::clinit>:
    //     NEW A
    //     GETFIELD A::x
    //     INC
    //     PUTFIELD A::x
}

// instrumented class
class C {
    fun foo(): Int {
        return B::a.x
    }
}

// code from Lincheck test (instrumented)
fun test() {
    C().apply { foo() }
}

This is what will happen when the test will be executed:

  1. Object C will be created and its method foo() will be called.
  2. During execution of foo(), class loading of B will be requested, and its <clinit> block will be executed.
  3. However, because B itself is not instrumented, its <clinit> block will not be wrapped into ignored section by the Lincheck instrumentation.
  4. Then B::<clinit> will call A::foo(), which is instrumented, and because the code is not in ignored section, it will be tracked by Lincheck.

This situation lead to various problems:

In the particular case of #376 we had NoClassDefFoundError, occured because of earlier ExceptionInInitializerError thrown from <clinit>. In the case of that issue we had the following concrete classes from the example above:

To see where HashSet is used in StackTraceElement look at the source code.

eupp commented 1 month ago

There are several possible ways to resolve this issue.

(1) Hard-coded solution for #376

We can handle the specific execution pattern occured in #376 involving StackTraceElement class. This is fine as a quick short-term solution, but I think that in long-term we need more robust and general solution.

(2) Instrument <clinit> of all loaded classes

To avoid the problem, we can instrument <clinit> blocks of all loaded classes, in order to wrap them in ignored sections.

Note that in case of lazy class re-transformation procedure employed by ManagedStrategy, we would have to ensure that the class is re-transformed before it is initialized.

We already have a similar mechanism, achieved with the help of ensureClassHierarchyIsTransformed and ensureObjectIsTransformed methods.

This mechanism utilizes the fact that managed strategy can intercept field reads and writes, and invoke class bytecode transformation at this point, mimicking the semantics of Java class initialization: https://docs.oracle.com/javase/specs/jls/se23/html/jls-12.html#jls-12.4

But it looks like currently our implementation of ManagedStrategy is not fully aligned with the Java class initialization semantics, so we will need to carefully review this part of the code.

Potential problem with this approach is that it may induce performance overhead, because it would require re-transformation of a large list of classes.

(3) Do not instrument java.util by default

The problem can be partly mitigated if we disable instrumentation of java.util classes.

In general, the fact that we do not instrument almost everything from java., but still instrument java.util is a bit confusing. After all, the java.util classes, like HashSet, are quite ubiquitous and are used almost everywhere. So even though java. classes are not instrumented themself, they still likely use classes from java.util (like HashSet), and thus their internals are still analyzed by Lincheck.

Perhaps, one of the reasons why java.util classes are instrumented, is that we want to be able to test classes from java.util.concurrent, like ConcurrentHashMap. In such cases, we can optionally enable instrumentation of some subset of java.util classes.

(4) Ensure invariants about instrumented and non-instrumented code interaction

In general, to avoid issues like the one with <clinit> described above, it would be good to establish certain invariants on how instrumented and non-instrumented code may interact, to arrive at a simpler mental model.

I would propose the following invariant:

We can implement a simple call-graph analysis to verify this invariant.

Note that this solution will also imply solution (3).

eupp commented 1 month ago

CC @ndkoval

eupp commented 2 weeks ago

423 implements temporary "quickfix" solution (1) for this problem.

eupp commented 2 weeks ago

As for solution (3), after another round of discussion with @ndkoval we decided that we cannot completely forbid analysis of java.util classes, and thus we should always assume at least some of the classes from this package are analyzed by the Lincheck.

eupp commented 2 weeks ago

As such, solution (4) also cannot be applied in its current form. The proposed invariant that "non-instrumented code can only call methods from non-instrumented classes" is too strong, and, unfortunately, we realistically need weaker and more complicated invariant.

eupp commented 2 weeks ago

Therefore, we can propose solution (5).

(5) Wrap calls of non-instrumented methods into ignored sections

We can wrap calls of non-instrumented methods, called from instrumented methods, into ignored sections. This approach ensures that they will be not analyzed, even if they itself call some methods from instrumented classes.

Additionally, as an optimization, we can wrap only non-instrumented methods that may call instrumented methods. If we are sure that some non-instrumented method never calls any instrumented method, there is no need to wrap it into ignored section.

This approach would still require some form of call-graph analysis.