dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.72k forks source link

jit disasm is confusing for R2R helpers with different entry points #69635

Closed am11 closed 1 year ago

am11 commented 2 years ago

Repro:

static class Foo
{
    public static int Bar { get; } = 1;
    static Foo() { } // even the empty cctor will force it to emit the helper
                     // call twice in each static property's getter.
}

crossgen'd with:

crossgen2 test.dll @refs.rsp --out test.r2r.dll \
    --codegenopt NgenDisasm='*' --codegenopt JitDiffableDasm=1 --parallelism 1 --optimize

emits the following asm for get_Bar():

; Assembly listing for method Foo:get_Bar():int
; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix
; ReadyToRun compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;# V00 OutArgs      [V00    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 8

G_M39382_IG01:
       push     rax
                                                ;; size=1 bbWeight=1    PerfScore 1.00
G_M39382_IG02:
       call     [CORINFO_HELP_READYTORUN_STATIC_BASE]
       call     [CORINFO_HELP_READYTORUN_STATIC_BASE]
       mov      eax, dword ptr [rax+52]
                                                ;; size=15 bbWeight=1    PerfScore 8.00
G_M39382_IG03:
       add      rsp, 8
       ret      
                                                ;; size=5 bbWeight=1    PerfScore 1.25

; Total bytes of code 21, prolog size 1, PerfScore 12.35, instruction count 6, allocated bytes for code 21 (MethodHash=a7166629) for method Foo:get_Bar():int
; ============================================================

CORINFO_HELP_READYTORUN_STATIC_BASE is emitted twice. This does not happen in the absence of .cctor.

using runtime / CG2 build from commit: 810a7f9cd78d6611c0473940cb733231f3eabc06.

category:implementation theme:ready-to-run skill-level:beginner cost:small impact:small

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.

Issue Details
Code: ```c# static class Foo { public static int Bar { get; } = 1; static Foo() { } // even the empty cctor will force it to emit the // helper twice for each static property's getter. } ``` crossgen'd with: ```sh crossgen2 test.dll @refs.rsp --out test.r2r.dll \ --codegenopt NgenDisasm='*' --codegenopt JitDiffableDasm=1 --parallelism 1 --optimize ``` generates the following code for get_Bar(): ```asm ; Assembly listing for method Foo:get_Bar():int ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix ; ReadyToRun compilation ; optimized code ; rsp based frame ; partially interruptible ; No PGO data ; Final local variable assignments ; ;# V00 OutArgs [V00 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace" ; ; Lcl frame size = 8 G_M39382_IG01: push rax ;; size=1 bbWeight=1 PerfScore 1.00 G_M39382_IG02: call [CORINFO_HELP_READYTORUN_STATIC_BASE] call [CORINFO_HELP_READYTORUN_STATIC_BASE] mov eax, dword ptr [rax+52] ;; size=15 bbWeight=1 PerfScore 8.00 G_M39382_IG03: add rsp, 8 ret ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code 21, prolog size 1, PerfScore 12.35, instruction count 6, allocated bytes for code 21 (MethodHash=a7166629) for method Foo:get_Bar():int ; ============================================================ ``` `CORINFO_HELP_READYTORUN_STATIC_BASE` is emitted twice. This does not happen in the absence of .cctor. using runtime / CG2 build from commit: 810a7f9cd78d6611c0473940cb733231f3eabc06.
Author: am11
Assignees: -
Labels: `area-CodeGen-coreclr`, `untriaged`
Milestone: -
jakobbotsch commented 2 years ago

Under the hood crossgen2 creates two different entry points for this helper that, to the JIT, is named the same thing. Internally in R2R the first helper call is a CctorTrigger helper call, while the second is a GetNonGCStaticBase helper call. It is more visible if you use r2rdump (which is broken right now, opened #69655):

int Program+Foo.get_Bar()
Id: 2
StartAddress: 0x000017B0
Size: 24 bytes
UnwindRVA: 0x000016F4
Version:            1
Flags:              0x03 EHANDLER UHANDLER
SizeOfProlog:       0x0004
CountOfUnwindCodes: 1
FrameRegister:      None
FrameOffset:        0x0
PersonalityRVA:     0x17E8
UnwindCode[0]: CodeOffset 0x0004 FrameOffset 0x0000 NextOffset 0x0 Op 40

Debug Info
    Bounds:
    Native Offset: 0x0, Prolog, Source Types: StackEmpty
    Native Offset: 0xA, IL Offset: 0x0000, Source Types: StackEmpty
    Native Offset: 0x13, Epilog, Source Types: StackEmpty

17b0: 48 83 ec 28               sub     rsp, 40
                                UWOP_ALLOC_SMALL 40                                0

17b4: ff 15 e6 18 00 00         call    qword ptr [0x30a0]    // Program+Foo (CCTOR_TRIGGER)
17ba: ff 15 e8 18 00 00         call    qword ptr [0x30a8]    // Program+Foo (STATIC_BASE_NON_GC)
17c0: 8b 40 34                  mov     eax, dword ptr [rax + 52]
17c3: 48 83 c4 28               add     rsp, 40
17c7: c3                        ret

Since these have different entry points the JIT does not e.g. CSE them.

am11 commented 2 years ago

@jakobbotsch, thank you for the explanation and fixing r2rdump.

trylek commented 2 years ago

I think there are several aspects to consider:

jakobbotsch commented 2 years ago
  • Do we want to disambiguate CORINFO_HELP_READYTORUN_STATIC_BASE_NON_GC and CORINFO_HELP_READYTORUN_CCTOR_TRIGGER in disassmembly? (the command I used was extracted from godbolt, so it might be interesting for user if this distinction is visible in the output: https://godbolt.org/z/dcxbsTcn5)

The only thing that disambiguates these two calls to the JIT is the fact that crossgen2 returned different entry points for them. This is an R2R-only thing, for runtime JIT the entry point is uniquely identified from the helper name. Since I don't think we want to introduce new JIT/R2R helpers just for cosmetic reasons the only disambiguating thing I think we could do is have the JIT print the entry point addresses if we notice the same helper being used with different entry points.

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
Repro: ```c# static class Foo { public static int Bar { get; } = 1; static Foo() { } // even the empty cctor will force it to emit the helper // call twice in each static property's getter. } ``` crossgen'd with: ```sh crossgen2 test.dll @refs.rsp --out test.r2r.dll \ --codegenopt NgenDisasm='*' --codegenopt JitDiffableDasm=1 --parallelism 1 --optimize ``` emits the following asm for `get_Bar()`: ```asm ; Assembly listing for method Foo:get_Bar():int ; Emitting BLENDED_CODE for X64 CPU with SSE2 - Unix ; ReadyToRun compilation ; optimized code ; rsp based frame ; partially interruptible ; No PGO data ; Final local variable assignments ; ;# V00 OutArgs [V00 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace" ; ; Lcl frame size = 8 G_M39382_IG01: push rax ;; size=1 bbWeight=1 PerfScore 1.00 G_M39382_IG02: call [CORINFO_HELP_READYTORUN_STATIC_BASE] call [CORINFO_HELP_READYTORUN_STATIC_BASE] mov eax, dword ptr [rax+52] ;; size=15 bbWeight=1 PerfScore 8.00 G_M39382_IG03: add rsp, 8 ret ;; size=5 bbWeight=1 PerfScore 1.25 ; Total bytes of code 21, prolog size 1, PerfScore 12.35, instruction count 6, allocated bytes for code 21 (MethodHash=a7166629) for method Foo:get_Bar():int ; ============================================================ ``` `CORINFO_HELP_READYTORUN_STATIC_BASE` is emitted twice. This does not happen in the absence of .cctor. using runtime / CG2 build from commit: 810a7f9cd78d6611c0473940cb733231f3eabc06.
Author: am11
Assignees: -
Labels: `area-CodeGen-coreclr`
Milestone: 7.0.0
am11 commented 2 years ago

Sounds reasonable. I have just noticed that with NativeAOT, it only calls the helper once:

# dotnet7 publish -o dist --use-current-runtime -c Release -p:PublishAot=true
$ gdb dist/nativelib1.so -batch -ex 'set disassembly-flavor intel' -ex 'disassemble nativelib1_Foo__get_Bar'
Dump of assembler code for function nativelib1_Foo__get_Bar:
   0x0000000000141f80 <+0>: push   rax
   0x0000000000141f81 <+1>: call   0xcb03c <__GetNonGCStaticBase_nativelib1_Foo>
   0x0000000000141f86 <+6>: mov    eax,DWORD PTR [rax]
   0x0000000000141f88 <+8>: add    rsp,0x8
   0x0000000000141f8c <+12>:    ret    
End of assembler dump.
jakobbotsch commented 2 years ago

NAOT might have evaluated the static constructor at compile time since it is trivial.