Patch: jacoco LabelFlowAnalyzer should ignore jazzer data flow trace calls

kmnls commented 1 year ago

Reason: JaCoCo produces wrong coverage because the coverage instrumentation happens after the hooks are set. This instrumentation should follow the JaCoCo algorithm but in fact they produce different results. Both algorithms inject control points into the end of each BB where the INVOKE happens. BBs where there are no INVOKEs are not marked. Let's check the case: BBs originally did not have any INVOKE but hooks are injected traceCmp callback. CoverageRecorder happily adds a control point into such a BB but JaCoCo will never know about this. As the result, all points after the mentioned will be misplaced in JaCoCo coverage.

Sample for test:

package com.example;

import java.util.Random;

public class Junit5Example1 {

    public static boolean earlyReturn = false;

    public static boolean logic(String input) {
        long random = new Random(42).nextLong();
        boolean cmp = input.startsWith("magicstring" + random);
        if (earlyReturn) {
            return true;
        }
        if (cmp
                && input.length() > 35
                && input.charAt(35) == 'C') {
            mustNeverBeCalled();
        }
        return true;
    }

    private static void mustNeverBeCalled() {
        throw new Error("mustNeverBeCalled has been called");
    }
}

All the lines after if (earlyReturn) will be colored wrongly because this BB will have an additional coverage point after traceCmp.

fmeum commented 1 year ago

Nice catch! We should definitely fix this.

Since we don't just apply a limited set of hooks before performing coverage instrumentation but also let the user add custom hooks, patching an exhaustive list into JaCoCo doesn't seem feasible.

Could you check whether making coverage instrumentation the first rather than the last pass in https://github.com/CodeIntelligenceTesting/jazzer/blob/0420f57d2b03eb93fa52eea1aa306e33c68288e8/src/main/java/com/code_intelligence/jazzer/agent/RuntimeInstrumentor.kt#L221-L232 fixes the problem for your reproducer?

Background: Our coverage instrumentation used to emit relatively complicated byte code that would trigger the tracing instrumentation, but this is no longer the case - we emit a single method call that shouldn't trigger any of the other instrumentation passes.

kmnls commented 1 year ago

It seems to me that all the user custom hooks are to intercept calls of some methods. BEFORE/AFTER/REPLACE are all toughly connected to the INVOKEs. The only different ones are those 'traces' which are bound to various opcodes Thus, there is no need to expect an extension of the list.

kmnls commented 1 year ago

I'll check the reverse of the hooks/coverage application and inform you.

kmnls commented 1 year ago

wrong.Junit5Example1.java.zip right.Junit5Example1.java.zip

kmnls commented 1 year ago

In my configuration for the exact project, from the first look, the proposed change of instrumentation order causes the same effect as the proposed patch. I have tried

    private fun instrument(internalClassName: String, bytecode: ByteArray, fullInstrumentation: Boolean): ByteArray {
        val classWithHooksEnabledField = if (Opt.conditionalHooks) {
            // Let the hook instrumentation emit additional logic that checks the value of the
            // hooksEnabled field on this class and skips the hook if it is false.
            "com/code_intelligence/jazzer/runtime/JazzerInternal"
        } else {
            null
        }
        return ClassInstrumentor(internalClassName, bytecode).run {
            if (fullInstrumentation) {
                coverageIdSynchronizer.withIdForClass(internalClassName) { firstId ->
                    coverage(firstId).also { actualNumEdgeIds ->
                        CoverageRecorder.recordInstrumentedClass(
                            internalClassName,
                            bytecode,
                            firstId,
                            actualNumEdgeIds,
                        )
                    }
                }
                // Hook instrumentation must be performed after data flow tracing as the injected
                // bytecode would trigger the GEP callbacks for byte[]. Coverage instrumentation
                // must be performed after hook instrumentation as the injected bytecode would
                // trigger the GEP callbacks for ByteBuffer.
                traceDataFlow(instrumentationTypes)
                hooks(includedHooks + customHooks, classWithHooksEnabledField)
            } else {
                hooks(customHooks, classWithHooksEnabledField)
            }
            instrumentedBytecode
        }
    }

fmeum commented 1 year ago

@kmnls That looks promising. Could you submit a PR with this alternative approach? It does seem simpler and not having to patch JaCoCo more than absolutely necessary will help us maintain our dependencies.

kmnls commented 1 year ago

See #711

CodeIntelligenceTesting / jazzer

Patch: jacoco LabelFlowAnalyzer should ignore jazzer data flow trace calls #707