dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.12k stars 4.7k forks source link

JIT: add option to print GC info interleaved with asm output #41647

Open BruceForstall opened 4 years ago

BruceForstall commented 4 years ago

Currently, when using COMPlus_JitGCDump/COMPlus_NgenGCDump, GC info is displayed in a block at the end of compilation.

Create options that allow the information to be interleaved into the asm dumped with COMPlus_JitDisasm/COMPlus_NgenDisasm. These options would be in addition to the current ones, which are still useful.

Interleaved output allows us to always add GC info into our asm diffs process. Currently, adding GC info into asm diffs requires enabling adding (back) instruction offsets into the asm output, because GC info is based on asm offsets. But adding these means that as soon as there is a single new instruction, or change in encoding size of an instruction, then all subsequent instructions in the function show up as a diff. This is really noisy. Thus, we rarely if ever pay attention to GC info diffs, which we should.

Note that the JIT dump does emit interleaved GC info as part of the "Instructions as they come out of the scheduler" phase. It would be better if the interleaved GC info dump suggested here come from decoding the fully generated GC info, the way the SOS !u -gcinfo command does it. Perhaps the code for !u -gcinfo could be borrowed for this (see https://github.com/dotnet/diagnostics/pull/403).

Changes should be made to jit-diff and possibly jit-analyze (https://github.com/dotnet/jitutils), and to superpmi.py, to use these new options.

category:implementation theme:asmdiffs skill-level:beginner cost:small impact:small

CarolEidt commented 4 years ago

@dotnet/jit-contrib I'm currently working on this, but would like to get some early feedback. For the first iteration of this, I am continuing to use the GC info represented in the JIT data structures, rather than decoding the GC info after it's generated. In addition, I'm modifying the GC dump to dump the info after the instruction that it pertains to, since it reflects the state after the instruction has been executed (I have always found it confusing that it printed it before). I'm also deleting some information that was redundant, and consolidating multiple updates into a single line. Here's a snippet of a disassembly (with COMPlus_JitDiffableDasm and a new COMPlus_JitDisasmWithGC=1:

       mov      rcx, gword ptr [rcx]
       ; gcrRegs +[rcx]
       call     System.Console:WriteLine(System.String,System.Object,System.Object)
       ; gcrRegs -[rax rcx rdx rbp r8]
                        ;; bbWeight=0.50 PerfScore 4.25
G_M17768_IG10:        ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
       mov      ecx, edi
       call     TreeNode:bottomUpTree(int):TreeNode
       ; gcrRegs +[rax]

Here's the 'before' snippet of the corresponding jitdump for those same instructions:

gcrReg +[rcx]
IN0021: 000093 mov      rcx, gword ptr [rcx]
New gcrReg live regs=00000000 {}
gcrReg -[rax]
gcrReg -[rcx]
gcrReg -[rdx]
gcrReg -[rbp]
gcrReg -[r8]
[D1FFAB1E] ptr arg pop  0
IN0022: 000096 call     System.Console:WriteLine(System.String,System.Object,System.Object)
                        ;; bbWeight=0.50 PerfScore 4.25
G_M17768_IG10:        ; func=00, offs=00009BH, size=0019H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
IN0023: 00009B mov      ecx, edi
New gcrReg live regs=00000001 {rax}
gcrReg +[rax]
[D1FFAB1E] ptr arg pop  0
IN0024: 00009D call     TreeNode:bottomUpTree(int):TreeNode

Here's the same snippet with my changes:

IN0021: 000093 mov      rcx, gword ptr [rcx]
       ; gcrRegs +[rcx]
IN0022: 000096 call     System.Console:WriteLine(System.String,System.Object,System.Object)
       ; gcrRegs -[rax rcx rdx rbp r8]
                        ;; bbWeight=0.50 PerfScore 4.25
G_M17768_IG10:        ; func=00, offs=00009BH, size=0019H, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
IN0023: 00009B mov      ecx, edi
IN0024: 00009D call     TreeNode:bottomUpTree(int):TreeNode
       ; gcrRegs +[rax]
BruceForstall commented 4 years ago

I like it!

I'm modifying the GC dump to dump the info after the instruction that it pertains to, since it reflects the state after the instruction has been executed (I have always found it confusing that it printed it before)

I've also often been confused by this. Does the new behavior match !u -gcinfo in SOS? Will all output of GC info in JIT dump be consistent here?

CarolEidt commented 4 years ago

Does the new behavior match !u -gcinfo in SOS?

I'm not sure - I'm not much of a user of SOS, but I'll check.

Will all output of GC info in JIT dump be consistent here?

Yes, that's the plan. The output methods are shared, at least for now, and I would presume we'd keep the same approach when/if we actually decode the GC info.

janvorli commented 4 years ago

It would be also nice to match where R2RDump puts it.

CarolEidt commented 4 years ago

It would be also nice to match where R2RDump puts it.

I'm not aware of R2RDump - can you point me to it?

BruceForstall commented 4 years ago

It's in the source tree as src\coreclr\src\Tools\r2rdump

CarolEidt commented 4 years ago

AFAICT, R2RDump simply dumps the GC info after the method (and doesn't seem to actually report ranges, just slots) I don't propose to implement interleaving of GC info for R2RDump, so I'm not sure in what way I would match R2RDump.

janvorli commented 4 years ago

R2RDump actually interleaves the stuff with disassembly. Here is an example that I've just created from S.P.C.:

void System.Runtime.CompilerServices.NullableAttribute..ctor(byte)
Id: 2
StartAddress: 0x0047EE10
Size: 114 bytes
UnwindRVA: 0x00B74D6C
Version:            1
Flags:              0x03 EHANDLER UHANDLER
SizeOfProlog:       0x0005
CountOfUnwindCodes: 2
FrameRegister:      None
FrameOffset:        0x0
PersonalityRVA:     0x3A749
UnwindCode[0]: CodeOffset 0x0005 FrameOffset 0x0000 NextOffset 0x0 Op 48

Debug Info
    Bounds:
    Native Offset: 0x0, Prolog, Source Types: StackEmpty
    Native Offset: 0x1B, IL Offset: 0x0000, Source Types: StackEmpty
    Native Offset: 0x1F, IL Offset: 0x0001, Source Types: CallInstruction
    Native Offset: 0x25, IL Offset: 0x0006, Source Types: StackEmpty
    Native Offset: 0x26, IL Offset: 0x0007, Source Types: StackEmpty
    Native Offset: 0x3D, IL Offset: 0x000e, Source Types: SourceTypeInvalid
    Native Offset: 0x6B, IL Offset: 0x0017, Source Types: StackEmpty
    Native Offset: 0x6C, Epilog, Source Types: StackEmpty

    Variable Locations:
    Variable Number: 1
    Start Offset: 0x0
    End Offset: 0x1B
    Loc Type: VLT_REG
    Register: RDX

    Variable Number: 0
    Start Offset: 0x0
    End Offset: 0x1B
    Loc Type: VLT_REG
    Register: RCX

    Variable Number: 1
    Start Offset: 0x1B
    End Offset: 0x6C
    Loc Type: VLT_STK
    Base Register: RBP
    Stack Offset: 24

    Variable Number: 0
    Start Offset: 0x1B
    End Offset: 0x6C
    Loc Type: VLT_STK
    Base Register: RBP
    Stack Offset: 16

  47ee10: 55                    push    rbp
                                UWOP_PUSH_NONVOL RBP(5)                                0

  47ee11: 48 83 ec 30           sub     rsp, 48
                                UWOP_ALLOC_SMALL 48                                0

  47ee15: 48 8d 6c 24 30        lea     rbp, [rsp + 48]
  47ee1a: 33 c0                 xor     eax, eax
  47ee1c: 48 89 45 f8           mov     qword ptr [rbp - 8], rax
  47ee20: 48 89 45 f0           mov     qword ptr [rbp - 16], rax
  47ee24: 48 89 4d 10           mov     qword ptr [rbp + 16], rcx
  47ee28: 89 55 18              mov     dword ptr [rbp + 24], edx
  47ee2b: 48 8b 4d 10           mov     rcx, qword ptr [rbp + 16]
                                RCX is live
  47ee2f: ff 15 bb f3 b8 ff     call    qword ptr [0xe1f0]    // void System.Attribute..ctor() (METHOD_ENTRY_DEF_TOKEN)
                                RCX is dead
  47ee35: 90                    nop
  47ee36: 48 8b 4d 10           mov     rcx, qword ptr [rbp + 16]
                                RCX is live
  47ee3a: 48 89 4d f8           mov     qword ptr [rbp - 8], rcx
  47ee3e: b9 01 00 00 00        mov     ecx, 1
                                RCX is dead
  47ee43: ff 15 47 23 b8 ff     call    qword ptr [0x1190]    // byte[] (NEW_ARRAY)
                                RAX is live
  47ee49: 48 89 45 f0           mov     qword ptr [rbp - 16], rax
  47ee4d: 48 8b 55 f0           mov     rdx, qword ptr [rbp - 16]
                                RDX is live
  47ee51: 33 c9                 xor     ecx, ecx
  47ee53: 3b 4a 08              cmp     ecx, dword ptr [rdx + 8]
  47ee56: 72 05                 jb      0x47ee5d
  47ee58: e8 08 b9 bb ff        call    0x3a765               // JMP [0x1028]: RNG_CHK_FAIL (HELPER)
                                RAX is dead
  47ee5d: 8b c1                 mov     eax, ecx
  47ee5f: 48 8d 54 02 10        lea     rdx, [rdx + rax + 16]
                                RDX is live
                                RDX is dead
  47ee64: 8b 4d 18              mov     ecx, dword ptr [rbp + 24]
  47ee67: 88 0a                 mov     byte ptr [rdx], cl
  47ee69: 48 8b 55 f8           mov     rdx, qword ptr [rbp - 8]
                                RDX is dead
                                RDX is live
  47ee6d: 48 8d 4a 08           lea     rcx, [rdx + 8]
                                RCX is live
  47ee71: 48 8b 55 f0           mov     rdx, qword ptr [rbp - 16]
  47ee75: ff 15 b5 21 b8 ff     call    qword ptr [0x1030]    // WRITE_BARRIER (HELPER)
                                RCX is dead
                                RDX is dead
  47ee7b: 90                    nop
  47ee7c: 48 8d 65 00           lea     rsp, [rbp]
  47ee80: 5d                    pop     rbp
  47ee81: c3                    ret
CarolEidt commented 4 years ago

OK, it seems that R2RDump also does the right thing and reports the deltas after the instruction. However, it individually reports each register or stack location (my changes will report them collectively). Also, the JIT dump reports variables on the stack by their JIT name (e.g. ; GC ptr vars +{V27}). I don't see any value in unifying these.

CarolEidt commented 4 years ago

The interleaving of GC info was implemented in #42552, but work remains to incorporate its use in jitutils.

We should also consider whether additional information should be in the assembly dump with COMPlus_JitDisasmWithGC so that any diff in the GC info is readily visible in an assembly diff.