dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.72k forks source link

Static field addresses are not deterministic when doing SPMI replay #53773

Open SingleAccretion opened 3 years ago

SingleAccretion commented 3 years ago

Reproduction steps:

1. Construct a trivial app that accesses a static:

public class Program
{
    static int _global;

    [MethodImpl(MethodImplOptions.AggressiveOptimization)]
    private static int Main()
    {
        return _global;
    }
}

2. Do an SPMI collection for that app. 3. Add some printf logging for the static address' value to morph.cpp here (or use -jitoption JitDump=*, but it did not work for me for some reason). 4. Replay using superpmi.exe, twice, log the output to two files. 5. git diff them. 6. You should see something like the following:

-Static field address is 2902599013844, method is Main
+Static field address is 2220318337444, method is Main

This can cause spurious diffs, because CSE uses these values, as was seen in #53748. The commit I reproduced this on is 44f050acc0f7791d6cd7ac772945444912bcf299.

cc @BruceForstall

category:eng-sys theme:super-pmi skill-level:expert cost:medium impact:medium

SingleAccretion commented 3 years ago

Debugging it some more, it looks like the root cause is this code: https://github.com/dotnet/runtime/blob/41af30ca291e0435083c0d4b5d70e2939e0dbc3d/src/coreclr/ToolBox/superpmi/superpmi-shared/methodcontext.cpp#L3592-L3602

SPMI tries to record values of static fields (all of them, even though only readonly ones are looked at by the Jit), so the value.fieldValue != (DWORD)-1 branch is hit practically always, and the changed temp returned.

This is necessary, since we must have the Jit look for the value now stored in the SPMI's map (replay_address), the original address will almost certainly make no sense in the replay process (in other words, dereferencing it will cause a segfault).

Looks like solving this might require extending the Jit-EE interface.

BruceForstall commented 3 years ago

I'll have to think about/debug this further. But typically what we would have to do is improve the recording side, in MethodContext::recGetFieldAddress().

BruceForstall commented 3 years ago

It seems like this is a problem in CSE for using the field address for any purpose that might cause diffs, not SuperPMI. E.g., it seems like multiple non-SuperPMI runs could also see asm diffs if the actual value of the address could affect codegen.

So I don't think this is a SuperPMI problem. SuperPMI does generally return the same addresses to the JIT as during record, assuming the JIT won't deref them, but things like the target code buffer address obviously must be able to change.

jakobbotsch commented 2 years ago

It seems like this is a problem in CSE for using the field address for any purpose that might cause diffs, not SuperPMI. E.g., it seems like multiple non-SuperPMI runs could also see asm diffs if the actual value of the address could affect codegen.

I think it is the case even without CSE using the field addresses, e.g. ARM64 can sometimes encode field addresses in fewer instructions depending on the specifics of their alignment/constant values. And of course there's a bunch of other things like static readonly fields that can affect the codegen depending on runtime specifics.

I think the problem here is that the contract is poorly specified. JIT assumes it can deref pointers to fields, and to make that work SPMI needs to return something else than what it saw at runtime. But JIT is also using the field address for codegen purposes. I think we need a new JIT-EE method to query the data for a field to make this work reliably (as a bonus, it may potentially help R2R/NAOT too).