Closed EgorBo closed 2 years ago
If it doesn't repro, run with for(;;) { /path/to/exe }
in powershell
Doesn't repro with debugger attached 😞
Looks like it always happen while in System.SpanHelpers:LastIndexOfValueType[short,DontNegate'1[short]](byref,short,int):int
; Assembly listing for method System.SpanHelpers:LastIndexOfValueType[short,DontNegate`1[short]](byref,short,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; fully interruptible
; No PGO data
; 4 inlinees with PGO data; 36 single block inlinees; 2 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T01] ( 15, 19 ) byref -> rsi single-def
; V01 arg1 [V01,T16] ( 6, 4 ) short -> rbx single-def
; V02 arg2 [V02,T03] ( 17, 21 ) int -> rdi
; V03 loc0 [V03,T00] ( 15, 18 ) long -> rax
; V04 loc1 [V04,T29] ( 8, 14.50) simd32 -> mm1
; V05 loc2 [V05,T31] ( 3, 5 ) simd32 -> mm0
; V06 loc3 [V06,T05] ( 6, 17 ) byref -> rax
; V07 loc4 [V07,T30] ( 8, 14.50) simd16 -> mm1
; V08 loc5 [V08,T32] ( 3, 5 ) simd16 -> mm0
; V09 loc6 [V09,T06] ( 6, 17 ) byref -> rax
; V10 OutArgs [V10 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V11 tmp1 [V11 ] ( 0, 0 ) int -> zero-ref
;* V12 tmp2 [V12 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V13 tmp3 [V13 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V14 tmp4 [V14 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V15 tmp5 [V15 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V16 tmp6 [V16,T07] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V17 tmp7 [V17 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V18 tmp8 [V18,T08] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V19 tmp9 [V19 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V20 tmp10 [V20,T09] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V21 tmp11 [V21 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V22 tmp12 [V22,T10] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V23 tmp13 [V23 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V24 tmp14 [V24,T11] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V25 tmp15 [V25 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V26 tmp16 [V26,T12] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V27 tmp17 [V27 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V28 tmp18 [V28,T13] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V29 tmp19 [V29 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V30 tmp20 [V30,T14] ( 0, 0 ) short -> zero-ref "Inlining Arg"
;* V31 tmp21 [V31 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
; V32 tmp22 [V32,T19] ( 2, 2 ) short -> rdx "Inlining Arg"
;* V33 tmp23 [V33 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
; V34 tmp24 [V34,T20] ( 2, 2 ) short -> rdx "Inlining Arg"
;* V35 tmp25 [V35 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
; V36 tmp26 [V36,T21] ( 2, 2 ) short -> rdx "Inlining Arg"
;* V37 tmp27 [V37 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
; V38 tmp28 [V38,T22] ( 2, 2 ) short -> rdx "Inlining Arg"
;* V39 tmp29 [V39 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
; V40 tmp30 [V40,T15] ( 2, 16 ) short -> rdx "Inlining Arg"
;* V41 tmp31 [V41 ] ( 0, 0 ) bool -> zero-ref "Inlining Arg"
;* V42 tmp32 [V42 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg"
;* V43 tmp33 [V43 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg"
; V44 tmp34 [V44,T23] ( 2, 1 ) int -> rdx "Inline stloc first use temp"
;* V45 tmp35 [V45 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V46 tmp36 [V46 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
;* V47 tmp37 [V47 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg"
;* V48 tmp38 [V48 ] ( 0, 0 ) simd32 -> zero-ref "Inlining Arg"
; V49 tmp39 [V49,T24] ( 2, 1 ) int -> rax "Inline stloc first use temp"
;* V50 tmp40 [V50 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V51 tmp41 [V51 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
;* V52 tmp42 [V52 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg"
;* V53 tmp43 [V53 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg"
; V54 tmp44 [V54,T25] ( 2, 1 ) int -> rcx "Inline stloc first use temp"
;* V55 tmp45 [V55 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V56 tmp46 [V56 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
;* V57 tmp47 [V57 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg"
;* V58 tmp48 [V58 ] ( 0, 0 ) simd16 -> zero-ref "Inlining Arg"
; V59 tmp49 [V59,T26] ( 2, 1 ) int -> rax "Inline stloc first use temp"
;* V60 tmp50 [V60 ] ( 0, 0 ) int -> zero-ref "Inline stloc first use temp"
;* V61 tmp51 [V61 ] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
; V62 tmp52 [V62,T04] ( 10, 13 ) int -> rax "Single return block return value"
; V63 cse0 [V63,T27] ( 2, 1 ) int -> rax "CSE - moderate"
; V64 cse1 [V64,T28] ( 2, 1 ) int -> rax "CSE - moderate"
; V65 cse2 [V65,T02] ( 11, 9 ) int -> rcx "CSE - aggressive"
; V66 rat0 [V66,T17] ( 3, 3 ) long -> rax "ReplaceWithLclVar is creating a new local variable"
; V67 rat1 [V67,T18] ( 3, 3 ) long -> rax "ReplaceWithLclVar is creating a new local variable"
;
; Lcl frame size = 32
G_M46255_IG01: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref, nogc <-- Prolog IG
57 push rdi
56 push rsi
53 push rbx
4883EC20 sub rsp, 32
C5F877 vzeroupper
488BF1 mov rsi, rcx
; byrRegs +[rsi]
8BDA mov ebx, edx
418BF8 mov edi, r8d
;; size=18 bbWeight=1 PerfScore 5.00
G_M46255_IG02: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz
85FF test edi, edi
7D1B jge SHORT G_M46255_IG04
;; size=4 bbWeight=1 PerfScore 1.25
G_M46255_IG03: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref
48B9182180AA47010000 mov rcx, 0x147AA802118 ; "Expected non-negative length"
488B09 mov rcx, gword ptr [rcx]
; gcrRegs +[rcx]
488B142500000000 mov rdx, gword ptr [0000H]
; gcrRegs +[rdx]
FF15FF530F00 call [System.Diagnostics.Debug:Fail(System.String,System.String)]
; gcrRegs -[rcx rdx]
; gcr arg pop 0
;; size=27 bbWeight=0.50 PerfScore 3.62
G_M46255_IG04: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref
83FF08 cmp edi, 8
0F8D85000000 jge G_M46255_IG14
;; size=9 bbWeight=1 PerfScore 1.25
G_M46255_IG05: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz, align
4863C7 movsxd rax, edi
48FFC8 dec rax
83FF04 cmp edi, 4
7C4F jl SHORT G_M46255_IG10
83C7FC add edi, -4
480FBF1446 movsx rdx, word ptr [rsi+2*rax]
480FBFCB movsx rcx, bx
3BD1 cmp edx, ecx
7506 jne SHORT G_M46255_IG06
E995010000 jmp G_M46255_IG22
90 align [1 bytes for IG11]
;; size=33 bbWeight=0.50 PerfScore 4.75
G_M46255_IG06: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz
480FBF5446FE movsx rdx, word ptr [rsi+2*rax-02H]
3BD1 cmp edx, ecx
7507 jne SHORT G_M46255_IG07
FFC8 dec eax
E983010000 jmp G_M46255_IG22
;; size=17 bbWeight=0.50 PerfScore 3.75
G_M46255_IG07: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz
480FBF5446FC movsx rdx, word ptr [rsi+2*rax-04H]
3BD1 cmp edx, ecx
7508 jne SHORT G_M46255_IG08
83C0FE add eax, -2
E971010000 jmp G_M46255_IG22
;; size=18 bbWeight=0.50 PerfScore 3.75
G_M46255_IG08: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz
480FBF5446FA movsx rdx, word ptr [rsi+2*rax-06H]
3BD1 cmp edx, ecx
7508 jne SHORT G_M46255_IG09
83C0FD add eax, -3
E95F010000 jmp G_M46255_IG22
;; size=18 bbWeight=0.50 PerfScore 3.75
G_M46255_IG09: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref
4883C0FC add rax, -4
;; size=4 bbWeight=0.50 PerfScore 0.12
G_M46255_IG10: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref
85FF test edi, edi
0F8E5E010000 jle G_M46255_IG23
480FBFCB movsx rcx, bx
;; size=12 bbWeight=0.50 PerfScore 0.75
G_M46255_IG11: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, loop=IG11, byref, isz
FFCF dec edi
480FBF1446 movsx rdx, word ptr [rsi+2*rax]
3BD1 cmp edx, ecx
740F je SHORT G_M46255_IG13
48FFC8 dec rax
85FF test edi, edi
7FEE jg SHORT G_M46255_IG11
;; size=18 bbWeight=4 PerfScore 28.00
G_M46255_IG12: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz, align
; byrRegs -[rsi]
E943010000 jmp G_M46255_IG23
0F1F00 align [3 bytes for IG15]
;; size=8 bbWeight=0.50 PerfScore 1.00
G_M46255_IG13: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
E930010000 jmp G_M46255_IG22
;; size=5 bbWeight=0.50 PerfScore 1.00
G_M46255_IG14: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref
; byrRegs +[rsi]
83FF10 cmp edi, 16
0F8CA6000000 jl G_M46255_IG18
480FBFCB movsx rcx, bx
C4E1796EC1 vmovd xmm0, ecx
C4E27D79C0 vpbroadcastw ymm0, ymm0
8D4FF0 lea ecx, [rdi-10H]
4863C1 movsxd rax, ecx
488D0446 lea rax, bword ptr [rsi+2*rax]
; byrRegs +[rax]
;; size=33 bbWeight=0.50 PerfScore 3.38
G_M46255_IG15: ; gcrefRegs=00000000 {}, byrefRegs=00000041 {rax rsi}, loop=IG15, byref, isz
C5FD7508 vpcmpeqw ymm1, ymm0, ymmword ptr[rax]
C4E27D17C9 vptest ymm1, ymm1
754A jne SHORT G_M46255_IG17
4883C0E0 add rax, -32
483BC6 cmp rax, rsi
73EC jae SHORT G_M46255_IG15
;; size=20 bbWeight=4 PerfScore 42.00
G_M46255_IG16: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz
; byrRegs -[rax]
8BC7 mov eax, edi
A80F test al, 15
0F84FB000000 je G_M46255_IG23
C5FD750E vpcmpeqw ymm1, ymm0, ymmword ptr[rsi]
C4E27D17C9 vptest ymm1, ymm1
0F84EB000000 je G_M46255_IG23
C5FD10050B010000 vmovupd ymm0, ymmword ptr[reloc @RWD00]
C4E27500C8 vpshufb ymm1, ymm1, ymm0
C4E3FD00C1D8 vpermq ymm0, ymm1, -40
C5F9D7C0 vpmovmskb eax, xmm0
F30FBDC0 lzcnt eax, eax
F7D8 neg eax
83C01F add eax, 31
E9B9000000 jmp G_M46255_IG22
align [0 bytes for IG19]
;; size=62 bbWeight=0.50 PerfScore 12.00
G_M46255_IG17: ; gcrefRegs=00000000 {}, byrefRegs=00000041 {rax rsi}, byref
; byrRegs +[rax]
C5FD1005E6000000 vmovupd ymm0, ymmword ptr[reloc @RWD00]
C4E27500C0 vpshufb ymm0, ymm1, ymm0
C4E3FD00C0D8 vpermq ymm0, ymm0, -40
C5F9D7D0 vpmovmskb edx, xmm0
482BC6 sub rax, rsi
; byrRegs -[rax]
488BC8 mov rcx, rax
48C1E93F shr rcx, 63
4803C1 add rax, rcx
48D1F8 sar rax, 1
F30FBDD2 lzcnt edx, edx
F7DA neg edx
8D44101F lea eax, [rax+rdx+1FH]
E981000000 jmp G_M46255_IG22
;; size=54 bbWeight=0.50 PerfScore 8.00
G_M46255_IG18: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref
480FBFCB movsx rcx, bx
C5F96EC1 vmovd xmm0, ecx
C4E27979C0 vpbroadcastw xmm0, xmm0
8D47F8 lea eax, [rdi-08H]
4898 cdqe
488D0446 lea rax, bword ptr [rsi+2*rax]
; byrRegs +[rax]
;; size=22 bbWeight=0.50 PerfScore 2.25
G_M46255_IG19: ; gcrefRegs=00000000 {}, byrefRegs=00000041 {rax rsi}, loop=IG19, byref, isz
C5F97508 vpcmpeqw xmm1, xmm0, xmmword ptr [rax]
C4E27917C9 vptest xmm1, xmm1
7534 jne SHORT G_M46255_IG21
4883C0F0 add rax, -16
483BC6 cmp rax, rsi
73EC jae SHORT G_M46255_IG19
;; size=20 bbWeight=4 PerfScore 34.00
G_M46255_IG20: ; gcrefRegs=00000000 {}, byrefRegs=00000040 {rsi}, byref, isz
; byrRegs -[rax]
8BC7 mov eax, edi
A807 test al, 7
745A je SHORT G_M46255_IG23
C5F9750E vpcmpeqw xmm1, xmm0, xmmword ptr [rsi]
C4E27917C9 vptest xmm1, xmm1
744E je SHORT G_M46255_IG23
C4E271000D74000000 vpshufb xmm1, xmm1, xmmword ptr [reloc @RWD00]
C5F9D7C1 vpmovmskb eax, xmm1
F30FBDC0 lzcnt eax, eax
F7D8 neg eax
83C01F add eax, 31
EB2A jmp SHORT G_M46255_IG22
;; size=41 bbWeight=0.50 PerfScore 8.50
G_M46255_IG21: ; gcrefRegs=00000000 {}, byrefRegs=00000041 {rax rsi}, byref
; byrRegs +[rax]
C4E27100055C000000 vpshufb xmm0, xmm1, xmmword ptr [reloc @RWD00]
C5F9D7C8 vpmovmskb ecx, xmm0
482BC6 sub rax, rsi
; byrRegs -[rax]
488BD0 mov rdx, rax
48C1EA3F shr rdx, 63
4803C2 add rax, rdx
48D1F8 sar rax, 1
33D2 xor edx, edx
F30FBDD1 lzcnt edx, ecx
F7DA neg edx
8D44101F lea eax, [rax+rdx+1FH]
;; size=41 bbWeight=0.50 PerfScore 4.62
G_M46255_IG22: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, epilog, nogc
; byrRegs -[rsi]
C5F877 vzeroupper
4883C420 add rsp, 32
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret
;; size=11 bbWeight=2 PerfScore 7.50
G_M46255_IG23: ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
B8FFFFFFFF mov eax, -1
;; size=5 bbWeight=0.50 PerfScore 0.12
G_M46255_IG24: ; , epilog, nogc, extend
C5F877 vzeroupper
4883C420 add rsp, 32
5B pop rbx
5E pop rsi
5F pop rdi
C3 ret
;; size=11 bbWeight=0.50 PerfScore 1.88
RWD00 dq 0F0D0B0907050301h, 8080808080808080h, 0F0D0B0907050301h, 8080808080808080h
; Total bytes of code 511, prolog size 18, PerfScore 234.45, instruction count 146, allocated bytes for code 522 (MethodHash=89f74b50) for method System.SpanHelpers:LastIndexOfValueType[short,DontNegate`1[short]](byref,short,int):int
; ============================================================
Hm.. happens here it seems (perhaps add rax -32
becomes an exterior pointer?
G_M46255_IG15: ; gcrefRegs=00000000 {}, byrefRegs=00000041 {rax rsi}, loop=IG15, byref, isz
00007ffe`c5fa2160 C5FD7508 vpcmpeqw ymm1, ymm0, ymmword ptr[rax]
00007ffe`c5fa2164 C4E27D17C9 vptest ymm1, ymm1
00007ffe`c5fa2169 754A jne SHORT G_M46255_IG17
00007ffe`c5fa216b 4883C0E0 add rax, -32
00007ffe`c5fa216f 483BC6 cmp rax, rsi ;;; <-----
00007ffe`c5fa2172 73EC jae SHORT G_M46255_IG15
;; size=20 bbWeight=4 PerfScore 42.00
this logic looks broken to me
Yes, it is a GC hole in LastIndexOfValueType
implementation.
<rant>
It is why I dislike byref
arithmetic being used in a complex code like this to save a few extra instructions for pinning. I am sure this is not the only GC hole of this kind that we have.</rant>
Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.
Author: | EgorBo |
---|---|
Assignees: | - |
Labels: | `area-System.Memory`, `blocking-release`, `area-VM-coreclr` |
Milestone: | 7.0.0 |
cc @adamsitnik @danmoseley One more bad regression from the vectorization change.
It’s worth noting ref arithmetic being difficult is one reason we have LoadUnsafe(ref T, nuint)
. It’s supposed to allow you a more “regular” usage much closer to regular arrays or pointer lookup and avoids needing to manually update a byref /etc. instead, just get the base byref and do a regular indexed for loop like you would with an array
The problem in the algorithm visualized (just for reference):
How would using pointer arithmetic instead of ref arithmetic have fixed this? Is the argument just that it might be easier to see the mistake in the algorithm if the code read currentSearchSpace -= Vector256<TValue>.Count;
rather than currentSearchSpace = ref Unsafe.Subtract(ref currentSearchSpace, Vector256<TValue>.Count);
such that there was less crowding visually? If we're erroneously dereferencing a ref/pointer that's before the object, it's a bug, and whether the object is pinned would seemingly just impact how the bug manifests.
Is the argument just that it might be easier to see the mistake in the algorithm if the code read currentSearchSpace -= Vector256
.Count;
currentSearchSpace -= Vector256<TValue>.Count
would work just fine with unmanaged pointer arithmetic. We would have unmanaged pointer that points outside the buffer, but that's not a problem unless it is dereferenced. And if it is dereferenced, the tests we have to verify buffer overruns should catch it.
We would have unmanaged pointer that points outside the buffer, but that's not a problem unless it is dereferenced.
This is the part I'm not understanding. What goes wrong with a managed reference that's pointing into an arbitrary location but that's not dereferenced? Is the issue that the GC itself might dereference it, and that could be an issue if it were e.g. past a page boundary?
Tanner answered my question here https://github.com/dotnet/runtime/pull/75857#discussion_r974672286.
The following program:
crashes with
with
on Checked runtime (win-x64).
Even on .NET 7.0 branch for me.