dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Proposal: add comments to specific kinds of generated code disassembly #97060

Open BruceForstall opened 9 months ago

BruceForstall commented 9 months ago

There are many patterns of RyuJIT codegen where something interesting is known about the code when generating it, and that interesting information would be valuable if displayed as a comment on the generated code.

The JIT already generates some comments on generated code, e.g., for handles and strings:

IN0009: 00003B mov      ecx, 0x8C144B8      ; 'ImplementingObject'
...
IN0005: 000021 test     byte  ptr [0x0E539ED9], 1      ; global ptr
...
IN0033: 0000D8 mov      eax, dword ptr [0x0E53A254]      ; static handle

One example of additional comments that could be added is around PInvoke handling. For example, the JIT could add:

IN0001: 000014 lea      edi, [V12+0x4 ebp-0x40]            ; InlinedCallFrame + offset of vtable
IN0002: 000017 call     CORINFO_HELP_INIT_PINVOKE_FRAME
...
IN0019: 00006B mov      dword ptr [V12+0xC ebp-0x38], 0xCCEEF18   ; InlinedCallFrame + offset of call target
IN001a: 000072 mov      eax, esp
IN001b: 000074 mov      dword ptr [V12+0x10 ebp-0x34], eax    ; InlinedCallFrame + offset of call-site SP
IN001c: 000077 lea      eax, G_M32631_IG06
IN001d: 00007D mov      dword ptr [V12+0x14 ebp-0x30], eax    ; InlinedCallFrame + offset of return address
IN001e: 000080 mov      byte  ptr [esi+0x08], 0  ; FrameListRoot + offset of GC state
IN001f: 000084 call     [System.Runtime.InteropServices.ComWrappers:<GetIUnknownImplInternal>g____PInvoke|25_0(uint,uint,uint)]
IN0020: 00008A mov      byte  ptr [esi+0x08], 1  ; FrameListRoot + offset of GC state
IN0021: 00008E cmp      dword ptr [0x6B1E5904], 0  ; GC return trap check
IN0022: 000095 je       SHORT G_M32631_IG07
IN0023: 000097 call     CORINFO_HELP_STOP_FOR_GC
...
IN0030: 0000CB mov      ecx, bword ptr [V12+0x8 ebp-0x3C]   ; InlinedCallFrame + offset of next frame link
IN0031: 0000CE mov      dword ptr [esi+0x0C], ecx    ; FrameListRoot + offset of current Frame

Other examples might be when generating explicit null checks, prolog/epilog actions, locals zeroing, etc.

To implement this, we would want a mechanism to associate an arbitrary comment text (or perhaps even multiple?) with any GenTree node. When generating a GenTree node or node tree to a set of instructions (instrDesc), the set of comments would need to be associated with the generated instructions, and then output during disassembly. Perhaps these associations should be done using side tables which could be DEBUG only, or perhaps easily enabled for non-DEBUG builds if that was determined to be useful.

Comments?

@dotnet/jit-contrib

ghost commented 9 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
There are many patterns of RyuJIT codegen where something interesting is known about the code when generating it, and that interesting information would be valuable if displayed as a comment on the generated code. The JIT already generates some comments on generated code, e.g., for handles and strings: ``` IN0009: 00003B mov ecx, 0x8C144B8 ; 'ImplementingObject' ... IN0005: 000021 test byte ptr [0x0E539ED9], 1 ; global ptr ... IN0033: 0000D8 mov eax, dword ptr [0x0E53A254] ; static handle ``` One example of additional comments that could be added is around PInvoke handling. For example, the JIT could add: ``` IN0001: 000014 lea edi, [V12+0x4 ebp-0x40] ; InlinedCallFrame + offset of vtable IN0002: 000017 call CORINFO_HELP_INIT_PINVOKE_FRAME ... IN0019: 00006B mov dword ptr [V12+0xC ebp-0x38], 0xCCEEF18 ; InlinedCallFrame + offset of call target IN001a: 000072 mov eax, esp IN001b: 000074 mov dword ptr [V12+0x10 ebp-0x34], eax ; InlinedCallFrame + offset of call-site SP IN001c: 000077 lea eax, G_M32631_IG06 IN001d: 00007D mov dword ptr [V12+0x14 ebp-0x30], eax ; InlinedCallFrame + offset of return address IN001e: 000080 mov byte ptr [esi+0x08], 0 ; FrameListRoot + offset of GC state IN001f: 000084 call [System.Runtime.InteropServices.ComWrappers:g____PInvoke|25_0(uint,uint,uint)] IN0020: 00008A mov byte ptr [esi+0x08], 1 ; FrameListRoot + offset of GC state IN0021: 00008E cmp dword ptr [0x6B1E5904], 0 ; GC return trap check IN0022: 000095 je SHORT G_M32631_IG07 IN0023: 000097 call CORINFO_HELP_STOP_FOR_GC ... IN0030: 0000CB mov ecx, bword ptr [V12+0x8 ebp-0x3C] ; InlinedCallFrame + offset of next frame link IN0031: 0000CE mov dword ptr [esi+0x0C], ecx ; FrameListRoot + offset of current Frame ``` Other examples might be when generating explicit null checks, prolog/epilog actions, locals zeroing, etc. To implement this, we would want a mechanism to associate an arbitrary comment text (or perhaps even multiple?) with any GenTree node. When generating a GenTree node or node tree to a set of instructions (instrDesc), the set of comments would need to be associated with the generated instructions, and then output during disassembly. Perhaps these associations should be done using side tables which could be DEBUG only, or perhaps easily enabled for non-DEBUG builds if that was determined to be useful. Comments? @dotnet/jit-contrib
Author: BruceForstall
Assignees: -
Labels: `area-CodeGen-coreclr`
Milestone: -
kunalspathak commented 9 months ago

This will certainly be very useful to have for debugging purpose. I wanted to do something similar to associate Interval/RefPositions to the assembly code to track the register allocation to the generated code. There are lot of other examples where annotating the code would help:

Should be fairly simple to do this once comment is attached to GenTree nodes. Before genCodeForTreeNode() , save the lastIns emitted and once the method is done, all the instrDesc generated can be attached to the comment of that GenTree node. We can also find out number of instructions generated and can include that in comment (e.g. Comment (5 instructions)) or along that line.

BruceForstall commented 9 months ago

It's possible there will be multiple types of comments, e.g.: