Open performanceautofiler[bot] opened 1 year ago
Tagging subscribers to this area: @dotnet/area-system-buffers See info in area-owners.md if you want to be subscribed.
Author: | performanceautofiler[bot] |
---|---|
Assignees: | AndyAyersMS |
Labels: | `area-System.Buffers`, `os-windows`, `arch-x64`, `runtime-coreclr` |
Milestone: | - |
Commit range is https://github.com/dotnet/runtime/compare/7afd85d1fd0b9edf0e2b58108caf74509930c6e5...edd6d6326f853be985172c3b7514acf33675ec82. Probably https://github.com/dotnet/runtime/pull/85654, @jakobbotsch? Link to the benchmark.
Similar stuff in https://github.com/dotnet/runtime/issues/85994.
Seems unlikely, #85654 had no diffs related to those benchmarks. But I don't see how the other commits would affect those benchmarks either.
Also could be https://github.com/dotnet/runtime/pull/85559 -- perhaps, some empty arrays got moved and so perturbed the alignment of the benchmark array?
This one is worth investigating more. It remains almost 1.5x slower with no clear culprit.
I'll take a closer look.
Yes this is from #85559, cc @EgorBo.
We end up putting a static array field used in the micro benchmark on the non-GC heap, but then we constant propagate its address into a loop and regress performance of the loop.
I wonder if we can generally avoid propagating from low weight blocks to high weight blocks, although it's probably not that simple. It also doesn't seem feasible to fix this in 8.0, so I will move this to 9.0.
-Importing BB01 (PC=000) of 'System.Buffers.Binary.Tests.BinaryReadAndWriteTests:MeasureReverseUsingNtoH():int[]:this'
- [ 0] 0 (0x000) ldsfld 04000B90
-Checking if we can import 'static readonly' as a jit-time constant...
- [ 1] 5 (0x005) stloc.0Querying runtime about current class of field <unknown class>:<unknown field> (declared as int[])
-Runtime reports field is init-only and initialized and has class int[]
-
-lvaUpdateClass: Updating class for V01 from (00007FFAE8373060) int[] to (00007FFAE8373060) int[] [exact]
-
-
-STMT00000 ( 0x000[E-] ... ??? )
- [000003] -A--G------ ▌ ASG ref
- [000002] D------N--- ├──▌ LCL_VAR ref V01 loc0
- [000001] #---G------ └──▌ IND ref
- [000000] H---------- └──▌ CNS_INT(h) long 0x1f0ccc06c08 const ptr Fseq[<unknown field>]
+Importing BB01 (PC=000) of 'System.Buffers.Binary.Tests.BinaryReadAndWriteTests:MeasureReverseUsingNtoH():int[]:this'
+ [ 0] 0 (0x000) ldsfld 04000B90
+Checking if we can import 'static readonly' as a jit-time constant... ... success! The value is:
+ [000000] H---------- ▌ CNS_INT(h) ref
+
+ [ 1] 5 (0x005) stloc.0
+
+STMT00000 ( 0x000[E-] ... ??? )
+ [000002] -A--------- ▌ ASG ref
+ [000001] D------N--- ├──▌ LCL_VAR ref V01 loc0
+ [000000] H---------- └──▌ CNS_INT(h) ref
System.Buffers.Binary.Tests.BinaryReadAndWriteTests.MeasureReverseUsingNtoH
Hot functions:
BinaryReadAndWriteTests.MeasureReverseUsingNtoH
(Tier-1)
Runnable_0.WorkloadActionUnroll
(Tier-1)
System.Buffers.Binary.Tests.BinaryReadAndWriteTests.MeasureReverseEndianness
Hot functions:
BinaryReadAndWriteTests.MeasureReverseEndianness
(Tier-1)
Runnable_0.WorkloadActionUnroll
(Tier-1)
I'll actually leave this in 8.0 for a bit, maybe someone else has an idea about something we can do.
Only seems to affect intel x64.
I think we should do something here, but I do not see any possible surgical fix. I think ideally this is fixed by 1) avoiding this propagation of constants into loops/higher weighted blocks in the first place, and 2) some rematerialization support for constants to not regress register allocation when we do 1. Will move this to 9.0.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Moving to 10 for the same reason as the above. We sadly did not get around to revisiting some of the necessities for improving cases like this one.
Run Information
Regressions in System.Buffers.Binary.Tests.BinaryReadAndWriteTests
Test Report
Repro
General Docs link: https://github.com/dotnet/performance/blob/main/docs/benchmarking-workflow-dotnet-runtime.md
Payloads
Baseline Compare