System.Collections.Sort<BigStruct>.LinqQuery has regressed on all configs except Windows 64 bit

adamsitnik commented 2 years ago

This regression seems to be specific to all configs except of Windows 64 bit.

Repro:

git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f net6.0 net7.0 --filter 'System.Collections.Sort<BigStruct>.LinqQuery'

Ubuntu Historical results

The diff points to https://github.com/dotnet/runtime/pull/55604 (cc @alexcovington) and https://github.com/dotnet/runtime/pull/59287 (cc @AndyAyersMS)

Windows Historical results

| Result | Base | Diff | Ratio | Operating System | Bit | | ------ | ---------:| ---------:| -----:| --------------------- | ----- | | Same | 47068.40 | 46996.12 | 1.00 | Windows 11 | X64 | | Same | 25061.00 | 25213.92 | 0.99 | Windows 11 | X64 | | Same | 81332.07 | 82470.68 | 0.99 | Windows 11 | X64 | | Same | 48471.02 | 49394.98 | 0.98 | Windows 10 | X64 | | Same | 61753.97 | 65909.26 | 0.94 | Windows 11 | X64 | | Same | 79322.94 | 78292.41 | 1.01 | Windows 11 | X64 | | Slower | 33152.41 | 48551.85 | 0.68 | ubuntu 18.04 | X64 | | Slower | 33670.35 | 49233.18 | 0.68 | ubuntu 20.04 | X64 | | Slower | 65475.08 | 84542.75 | 0.77 | ubuntu 18.04 | X64 | | Same | 102906.71 | 95691.63 | 1.08 | ubuntu 18.04 | X64 | | Slower | 78941.99 | 99516.66 | 0.79 | pop 20.04 | X64 | | Slower | 58025.14 | 76420.12 | 0.76 | alpine 3.13 | X64 | | Slower | 58358.38 | 87952.92 | 0.66 | debian 11 | X64 | | Same | 39738.00 | 38447.43 | 1.03 | macOS Monterey 12.2.1 | Arm64 | | Same | 81077.12 | 83539.94 | 0.97 | Windows 10 | Arm64 | | Same | 84261.45 | 85918.34 | 0.98 | Windows 11 | Arm64 | | Slower | 51385.76 | 75022.36 | 0.68 | Windows 11 | X86 | | Slower | 68915.32 | 91940.60 | 0.75 | Windows 10 | X86 | | Slower | 61701.11 | 79972.87 | 0.77 | Windows 10 | X86 | | Slower | 57559.08 | 70356.86 | 0.82 | Windows 10 | X86 | | Same | 151162.22 | 145951.59 | 1.04 | Windows 10 | Arm | | Slower | 90819.89 | 108997.55 | 0.83 | macOS Big Sur 11.6.3 | X64 | | Slower | 73211.06 | 98121.94 | 0.75 | macOS Monterey 12.2.1 | X64 | | Slower | 79186.88 | 106613.19 | 0.74 | macOS Monterey 12.2.1 | X64 |

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.

Issue Details

This regression seems to be specific to all configs except of Windows 64 bit. Repro: ```cmd git clone https://github.com/dotnet/performance.git python3 ./performance/scripts/benchmarks_ci.py -f net6.0 net7.0 --filter 'System.Collections.Sort.LinqQuery' ``` [Ubuntu Historical results](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Collections.Sort(BigStruct).LinqQuery(Size%3a%20512).html) ![image](https://user-images.githubusercontent.com/6011991/158862045-772bd4e8-61f0-4592-be5a-3e6e543c6f61.png) The [diff](https://github.com/dotnet/runtime/compare/c980180198a2457cba656a98fd7b4f647cf401e0...eafc6f18d5bbcf0f0c70739ce4c8139d66a9e099) points to https://github.com/dotnet/runtime/pull/55604 (cc @alexcovington) and https://github.com/dotnet/runtime/pull/59287 (cc @AndyAyersMS) [Windows Historical results](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Collections.Sort(BigStruct).LinqQuery(Size%3a%20512).html) ![image](https://user-images.githubusercontent.com/6011991/158862298-6644e52c-72da-463d-8ed0-8f3cf1e03f14.png)

| Result | Base | Diff | Ratio | Operating System | Bit | | ------ | ---------:| ---------:| -----:| --------------------- | ----- | | Same | 47068.40 | 46996.12 | 1.00 | Windows 11 | X64 | | Same | 25061.00 | 25213.92 | 0.99 | Windows 11 | X64 | | Same | 81332.07 | 82470.68 | 0.99 | Windows 11 | X64 | | Same | 48471.02 | 49394.98 | 0.98 | Windows 10 | X64 | | Same | 61753.97 | 65909.26 | 0.94 | Windows 11 | X64 | | Same | 79322.94 | 78292.41 | 1.01 | Windows 11 | X64 | | Slower | 33152.41 | 48551.85 | 0.68 | ubuntu 18.04 | X64 | | Slower | 33670.35 | 49233.18 | 0.68 | ubuntu 20.04 | X64 | | Slower | 65475.08 | 84542.75 | 0.77 | ubuntu 18.04 | X64 | | Same | 102906.71 | 95691.63 | 1.08 | ubuntu 18.04 | X64 | | Slower | 78941.99 | 99516.66 | 0.79 | pop 20.04 | X64 | | Slower | 58025.14 | 76420.12 | 0.76 | alpine 3.13 | X64 | | Slower | 58358.38 | 87952.92 | 0.66 | debian 11 | X64 | | Same | 39738.00 | 38447.43 | 1.03 | macOS Monterey 12.2.1 | Arm64 | | Same | 81077.12 | 83539.94 | 0.97 | Windows 10 | Arm64 | | Same | 84261.45 | 85918.34 | 0.98 | Windows 11 | Arm64 | | Slower | 51385.76 | 75022.36 | 0.68 | Windows 11 | X86 | | Slower | 68915.32 | 91940.60 | 0.75 | Windows 10 | X86 | | Slower | 61701.11 | 79972.87 | 0.77 | Windows 10 | X86 | | Slower | 57559.08 | 70356.86 | 0.82 | Windows 10 | X86 | | Same | 151162.22 | 145951.59 | 1.04 | Windows 10 | Arm | | Slower | 90819.89 | 108997.55 | 0.83 | macOS Big Sur 11.6.3 | X64 | | Slower | 73211.06 | 98121.94 | 0.75 | macOS Monterey 12.2.1 | X64 | | Slower | 79186.88 | 106613.19 | 0.74 | macOS Monterey 12.2.1 | X64 |

Author:	adamsitnik
Assignees:	-
Labels:	`area-System.Collections`, `tenet-performance`
Milestone:	-

AndyAyersMS commented 2 years ago

https://github.com/dotnet/runtime/pull/59287 is locked so doesn't get cross linked. That seems unfortunate.

That change should have purely impacted jit diagnostics, so it's unlikely to have caused regressions.

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

This regression seems to be specific to all configs except of Windows 64 bit. Repro: ```cmd git clone https://github.com/dotnet/performance.git python3 ./performance/scripts/benchmarks_ci.py -f net6.0 net7.0 --filter 'System.Collections.Sort.LinqQuery' ``` [Ubuntu Historical results](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fSystem.Collections.Sort(BigStruct).LinqQuery(Size%3a%20512).html) ![image](https://user-images.githubusercontent.com/6011991/158862045-772bd4e8-61f0-4592-be5a-3e6e543c6f61.png) The [diff](https://github.com/dotnet/runtime/compare/c980180198a2457cba656a98fd7b4f647cf401e0...eafc6f18d5bbcf0f0c70739ce4c8139d66a9e099) points to https://github.com/dotnet/runtime/pull/55604 (cc @alexcovington) and https://github.com/dotnet/runtime/pull/59287 (cc @AndyAyersMS) [Windows Historical results](https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_Windows%2010.0.18362%2fSystem.Collections.Sort(BigStruct).LinqQuery(Size%3a%20512).html) ![image](https://user-images.githubusercontent.com/6011991/158862298-6644e52c-72da-463d-8ed0-8f3cf1e03f14.png)

| Result | Base | Diff | Ratio | Operating System | Bit | | ------ | ---------:| ---------:| -----:| --------------------- | ----- | | Same | 47068.40 | 46996.12 | 1.00 | Windows 11 | X64 | | Same | 25061.00 | 25213.92 | 0.99 | Windows 11 | X64 | | Same | 81332.07 | 82470.68 | 0.99 | Windows 11 | X64 | | Same | 48471.02 | 49394.98 | 0.98 | Windows 10 | X64 | | Same | 61753.97 | 65909.26 | 0.94 | Windows 11 | X64 | | Same | 79322.94 | 78292.41 | 1.01 | Windows 11 | X64 | | Slower | 33152.41 | 48551.85 | 0.68 | ubuntu 18.04 | X64 | | Slower | 33670.35 | 49233.18 | 0.68 | ubuntu 20.04 | X64 | | Slower | 65475.08 | 84542.75 | 0.77 | ubuntu 18.04 | X64 | | Same | 102906.71 | 95691.63 | 1.08 | ubuntu 18.04 | X64 | | Slower | 78941.99 | 99516.66 | 0.79 | pop 20.04 | X64 | | Slower | 58025.14 | 76420.12 | 0.76 | alpine 3.13 | X64 | | Slower | 58358.38 | 87952.92 | 0.66 | debian 11 | X64 | | Same | 39738.00 | 38447.43 | 1.03 | macOS Monterey 12.2.1 | Arm64 | | Same | 81077.12 | 83539.94 | 0.97 | Windows 10 | Arm64 | | Same | 84261.45 | 85918.34 | 0.98 | Windows 11 | Arm64 | | Slower | 51385.76 | 75022.36 | 0.68 | Windows 11 | X86 | | Slower | 68915.32 | 91940.60 | 0.75 | Windows 10 | X86 | | Slower | 61701.11 | 79972.87 | 0.77 | Windows 10 | X86 | | Slower | 57559.08 | 70356.86 | 0.82 | Windows 10 | X86 | | Same | 151162.22 | 145951.59 | 1.04 | Windows 10 | Arm | | Slower | 90819.89 | 108997.55 | 0.83 | macOS Big Sur 11.6.3 | X64 | | Slower | 73211.06 | 98121.94 | 0.75 | macOS Monterey 12.2.1 | X64 | | Slower | 79186.88 | 106613.19 | 0.74 | macOS Monterey 12.2.1 | X64 |

Author:	adamsitnik
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`
Milestone:	7.0.0

AndyAyersMS commented 2 years ago

Digging through it looks like we expected this to be resolved -- see https://github.com/dotnet/perf-autofiling-issues/issues/1501#issuecomment-926027832

But that only fixed issues on Windows, Ubuntu did not benefit. So we still have a regression.

newplot - 2022-08-08T081954 386

(Windows is slightly worse off too)

newplot - 2022-08-08T082051 573

AndyAyersMS commented 2 years ago

Looks like this is still unassigned. I'll take it for now.

AndyAyersMS commented 2 years ago

Can reproduce running locally (via wsl2)

BenchmarkDotNet=v0.13.1.1823-nightly, OS=ubuntu 20.04
Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-rc.1.22408.1
  [Host]     : .NET 7.0.0 (7.0.22.40308), X64 RyuJIT
  Job-CFAJOE : .NET 5.0.1 (5.0.120.57516), X64 RyuJIT
  Job-JPHJBC : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT
  Job-KPSCOL : .NET 7.0.0 (7.0.22.40308), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  InvocationCount=5000  IterationTime=250.0000 ms
MaxIterationCount=20  MinIterationCount=15  MinWarmupIterationCount=6
UnrollFactor=1  WarmupCount=-1

Method	Job	Runtime	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Allocated	Alloc Ratio
LinqQuery	Job-CFAJOE	.NET 5.0	net5.0	512	56.80 us	0.729 us	0.682 us	56.79 us	55.69 us	58.34 us	1.00	0.00	5.4000	0.4000	34.33 KB	1.00
LinqQuery	Job-JPHJBC	.NET 6.0	net6.0	512	58.25 us	0.707 us	0.662 us	57.97 us	57.41 us	59.44 us	1.03	0.02	5.4000	0.4000	34.33 KB	1.00
LinqQuery	Job-KPSCOL	.NET 7.0	net7.0	512	72.44 us	1.321 us	1.235 us	72.28 us	70.45 us	74.83 us	1.28	0.03	5.6000	0.6000	34.33 KB	1.00

AndyAyersMS commented 2 years ago

@adamsitnik is it expected that with -p EP I won't get cpu sample events? If so, any way to enable these via the command line?

AndyAyersMS commented 2 years ago

Hmm, I guess there are sample events but not ones that perfview recognizes?

AndyAyersMS commented 2 years ago

From the above I can get a crude profile of sorts. But not sure it is helping me spot which method(s) have regressed.

adamsitnik commented 2 years ago

I guess there are sample events but not ones that perfview recognizes?

In case of EventPipe we just get different CPU samples (events emitted by the .NET Runtime, not the OS). In PerfView you need to open the "Thread Time" view (not "CPU Stacks" like usual):

Or you can take the .speedscope file generated by BDN:

Exported 1 trace file(s). Example:
D:\projects\performance\artifacts\bin\MicroBenchmarks\Release\net7.0\BenchmarkDotNet.Artifacts\System.Collections.Sort_BigStruct_.LinqQuery(Size_ 512)-20220809-091754.speedscope.json

and open it with speedscope

AndyAyersMS commented 2 years ago

Still didn't find that very helpful. But here's perf (via WSL2) on the two:

If this is credible then the issue is in this bit of code.

;; 6.0 

; Assembly listing for method GenericComparer`1:Compare(BigStruct,BigStruct):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; Tier-1 compilation
; optimized code
; rbp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def
;  V01 arg1         [V01,T03] (  2,  1.36)  struct (32) [rbp+10H]   do-not-enreg[SF] ld-addr-op single-def
;  V02 arg2         [V02,T04] (  1,  1   )  struct (32) [rbp+30H]   do-not-enreg[SB] single-def
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;  V04 tmp1         [V04,T01] (  2,  4   )  struct (32) [rbp-20H]   do-not-enreg[SFB] "Inlining Arg"
;  V05 tmp2         [V05,T02] (  4,  1.50)     int  ->  rax         "Inline return value spill temp"
;  V06 tmp3         [V06,T00] (  3,  4.71)     int  ->  rax         "Inlining Arg"
;
; Lcl frame size = 32

G_M25642_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC20             sub      rsp, 32
       C5F877               vzeroupper 
       488D6C2420           lea      rbp, [rsp+20H]
                        ;; bbWeight=1    PerfScore 2.75
G_M25642_IG02:              ;; offset=000DH
       C5FA6F4530           vmovdqu  xmm0, xmmword ptr [rbp+30H]
       C5FA7F45E0           vmovdqu  xmmword ptr [rbp-20H], xmm0
       C5FA6F4540           vmovdqu  xmm0, xmmword ptr [rbp+40H]
       C5FA7F45F0           vmovdqu  xmmword ptr [rbp-10H], xmm0
       8B45EC               mov      eax, dword ptr [rbp-14H]
       39451C               cmp      dword ptr [rbp+1CH], eax
       7C14                 jl       SHORT G_M25642_IG07
                        ;; bbWeight=1    PerfScore 7.00
G_M25642_IG03:              ;; offset=0029H
       39451C               cmp      dword ptr [rbp+1CH], eax
       7F08                 jg       SHORT G_M25642_IG06
                        ;; bbWeight=0.36 PerfScore 0.71
G_M25642_IG04:              ;; offset=002EH
       33C0                 xor      eax, eax
                        ;; bbWeight=0.26 PerfScore 0.06
G_M25642_IG05:              ;; offset=0030H
       4883C420             add      rsp, 32
       5D                   pop      rbp
       C3                   ret      
                        ;; bbWeight=1    PerfScore 1.75
G_M25642_IG06:              ;; offset=0036H
       B801000000           mov      eax, 1
       EBF3                 jmp      SHORT G_M25642_IG05
                        ;; bbWeight=0.10 PerfScore 0.22
G_M25642_IG07:              ;; offset=003DH
       B8FFFFFFFF           mov      eax, -1
       EBEC                 jmp      SHORT G_M25642_IG05
                        ;; bbWeight=0.14 PerfScore 0.32

versus

;; 7.0

; Assembly listing for method GenericComparer`1:Compare(BigStruct,BigStruct):int:this
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; Tier-1 compilation
; optimized code
; rbp based frame
; partially interruptible
; No PGO data
; 1 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd single-def
;  V01 arg1         [V01,T03] (  2,  1.35)  struct (32) [rbp+10H]   do-not-enreg[SF] ld-addr-op single-def
;  V02 arg2         [V02,T04] (  1,  1   )  struct (32) [rbp+30H]   do-not-enreg[S] single-def
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;  V04 tmp1         [V04,T01] (  2,  4   )  struct (32) [rbp-20H]   do-not-enreg[SF] "Inlining Arg"
;  V05 tmp2         [V05,T02] (  4,  1.50)     int  ->  rax         "Inline return value spill temp"
;  V06 tmp3         [V06,T00] (  3,  4.70)     int  ->  rax         "Inlining Arg"
;
; Lcl frame size = 32

G_M25642_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC20             sub      rsp, 32
       C5F877               vzeroupper 
       488D6C2420           lea      rbp, [rsp+20H]
                        ;; size=13 bbWeight=1    PerfScore 2.75
G_M25642_IG02:              ;; offset=000DH
       C5FE6F4530           vmovdqu  ymm0, ymmword ptr[rbp+30H]
       C5FE7F45E0           vmovdqu  ymmword ptr[rbp-20H], ymm0
       8B45EC               mov      eax, dword ptr [rbp-14H]
       39451C               cmp      dword ptr [rbp+1CH], eax
       7C17                 jl       SHORT G_M25642_IG07
                        ;; size=18 bbWeight=1    PerfScore 9.00
G_M25642_IG03:              ;; offset=001FH
       39451C               cmp      dword ptr [rbp+1CH], eax
       7F0B                 jg       SHORT G_M25642_IG06
                        ;; size=5 bbWeight=0.35 PerfScore 1.06
G_M25642_IG04:              ;; offset=0024H
       33C0                 xor      eax, eax
                        ;; size=2 bbWeight=0.25 PerfScore 0.06
G_M25642_IG05:              ;; offset=0026H
       C5F877               vzeroupper 
       4883C420             add      rsp, 32
       5D                   pop      rbp
       C3                   ret      
                        ;; size=9 bbWeight=1    PerfScore 2.75
G_M25642_IG06:              ;; offset=002FH
       B801000000           mov      eax, 1
       EBF0                 jmp      SHORT G_M25642_IG05
                        ;; size=7 bbWeight=0.10 PerfScore 0.22
G_M25642_IG07:              ;; offset=0036H
       B8FFFFFFFF           mov      eax, -1
       EBE9                 jmp      SHORT G_M25642_IG05
                        ;; size=7 bbWeight=0.15 PerfScore 0.33

AndyAyersMS commented 2 years ago

Note with AVX/AVX2 disabled 6 and 7 match perf (and match 6 with avx enabled)

BenchmarkDotNet=v0.13.1.1823-nightly, OS=ubuntu 20.04 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK=7.0.100-rc.1.22408.1 [Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT Job-KAQRRV : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT Job-SXOIEW : .NET 7.0.0 (7.0.22.40308), X64 RyuJIT

EnvironmentVariables=COMPlus_EnableAVX2=0,COMPlus_EnableAVX=0 PowerPlanMode=00000000-0000-0000-0000-000000000000 InvocationCount=5000 IterationTime=250.0000 ms MaxIterationCount=20 MinIterationCount=15 MinWarmupIterationCount=6 UnrollFactor=1 WarmupCount=-1

Method	Job	Runtime	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen 0	Gen 1	Allocated	Alloc Ratio
LinqQuery	Job-KAQRRV	.NET 6.0	net6.0	512	55.18 us	0.762 us	0.675 us	55.05 us	54.16 us	56.73 us	1.00	5.4000	0.4000	34.33 KB	1.00
LinqQuery	Job-SXOIEW	.NET 7.0	net7.0	512	57.40 us	0.461 us	0.409 us	57.28 us	56.93 us	58.04 us	1.04	5.6000	0.6000	34.33 KB	1.00

Going to modify the jit so I can do this per-method and see if just disabling AVX for the comparer explains the perf loss.

AndyAyersMS commented 2 years ago

Looks like the regression comes from the use of YMM registers in the two hottest methods above

System.Linq.EnumerableSorter2[BigStruct,BigStruct][System.Collections.BigStruct,System.Collections.BigStruct]:CompareAnyKeys(int,int)`
System.Collections.Generic.GenericComparer1[BigStruct][System.Collections.BigStruct]::Compare`

In both cases there is a YMM store closely followed by a narrower load:

;; Compare

       C5FE7F45E0           vmovdqu  ymmword ptr[rbp-20H], ymm0
       8B45EC               mov      eax, dword ptr [rbp-14H]

;; CompareAnyKeys

       C5FE7F45C8           vmovdqu  ymmword ptr[rbp-38H], ymm0
       C5FA6F45C8           vmovdqu  xmm0, qword ptr [rbp-38H]

AndyAyersMS commented 2 years ago

On windows, there is similar codegen in Compare but not in CompareAnyKeys -- the latter because of ABI differences.

;; (windows) Compare

       C5FE7F442408         vmovdqu  ymmword ptr[rsp+08H], ymm0
       8B442414             mov      eax, dword ptr [rsp+14H]

Despire this, perf on windows generally seems better (around 53us). Note the store above is misaligned (as is the store in linux's CompareAnyKeys) if that matters.

Also note that in Compare the struct copy is really not needed. Seems like forward sub (or morph's copy prop) should get this case, but neither one sees the use:


;; tmp1 is single use
***** BB03
STMT00003 ( 0x010[E-] ... ??? )
               [000027] -A---------                         *  ASG       struct (copy)
               [000025] D------N---                         +--*  LCL_VAR   struct<System.Collections.BigStruct, 32> V04 tmp1         
               [000013] n----------                         \--*  OBJ       struct<System.Collections.BigStruct, 32>
               [000012] -----------                            \--*  ADDR      byref 
               [000010] -------N---                               \--*  LCL_VAR   struct<System.Collections.BigStruct, 32> V02 arg2         

***** BB03
STMT00009 ( INL01 @ 0x000[E-] ... ??? ) <- INLRT @ 0x010[E-]
               [000058] -A---------                         *  ASG       int   
               [000057] D------N---                         +--*  LCL_VAR   int    V06 tmp3         
               [000022] -----------                         \--*  FIELD     int    _int1
               [000021] -----------                            \--*  ADDR      byref 
               [000020] -------N---                               \--*  LCL_VAR   struct<System.Collections.BigStruct, 32> V04 tmp1         

;; fwd sub

    [000027]:  no next stmt use

;; morph

In BB01 New Local Copy     Assertion: V04 == V02, index = #01

fgMorphTree BB01, STMT00009 (before)
               [000058] -A---------                         *  ASG       int   
               [000057] D------N---                         +--*  LCL_VAR   int    V06 tmp3         
               [000022] -----------                         \--*  LCL_FLD   int    V04 tmp1         [+12]

AndyAyersMS commented 2 years ago

Verified this is mitigated with the preliminary changes from #73719.

This is beyond the scope of what we can fix for .net7, so I think we're going to have to live with this regression.

Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Gen 1	Allocated	Alloc Ratio
LinqQuery	Job-XBERYB	.net7	512	70.09 us	0.513 us	0.455 us	70.11 us	69.06 us	70.89 us	1.21	0.02	5.6000	0.6000	34.33 KB	1.00
LinqQuery	Job-WMUOPH	#73719	512	56.01 us	0.644 us	0.602 us	55.81 us	55.26 us	57.21 us	0.97	0.02	5.6000	0.6000	34.33 KB	1.00
LinqQuery	Job-NGYOUF	.net6	512	57.87 us	1.093 us	1.023 us	57.66 us	56.53 us	59.62 us	1.00	0.00	5.4000	0.4000	34.33 KB	1.00

AndyAyersMS commented 2 years ago

This should be fixed by https://github.com/dotnet/runtime/pull/74384.

AndyAyersMS commented 2 years ago

(ubuntu x64)

newplot - 2022-09-02T152343 171

dotnet / runtime

System.Collections.Sort<BigStruct>.LinqQuery has regressed on all configs except Windows 64 bit #66776