Open ShuiRuTian opened 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.
Author: | ShuiRuTian |
---|---|
Assignees: | - |
Labels: | `tenet-performance`, `area-CodeGen-coreclr`, `untriaged` |
Milestone: | - |
This is indeed a question for the x86_64 architecture: is [rax + constant]
addressing mode available for the INC
instruction?
The answer is yes: INC DWORD PTR [RAX + 8]
is valid, confirmed with C++ inline asm. But I'm not sure the performance.
@huoyaoyuan Oops, sorry for not pointing it out. We could just take a look at the generated code Benchmarks.StructInline.NormalContainerUpdate
, which uses the pattern [base + offset]
.
edit:
Delete the example, which is wrong, it should use ref
to work correctly. And after the change, the result is not impressive.
edit:
And not only inc
, but also add
, imul
, and
and or
. I believe there are more.
The trees in question:
***** BB01
STMT00005 ( INL03 @ 0x000[E-] ... ??? ) <- INLRT @ 0x005[--]
[000018] -A--G+------ * ASG byref
[000017] D----+-N---- +--* LCL_VAR byref V03 tmp2
[000048] -----+------ \--* ADD byref
[000046] -----+------ +--* LCL_VAR ref V02 tmp1
[000047] -----+------ \--* CNS_INT long 8 field offset Fseq[foo, a]
***** BB01
STMT00006 ( INL03 @ ??? ... ??? ) <- INLRT @ 0x005[--]
[000025] -A-XG+------ * ASG int
[000024] *--X-+-N---- +--* IND int
[000019] -----+------ | \--* LCL_VAR byref V03 tmp2
[000023] ---XG+------ \--* ADD int
[000021] *--XG+------ +--* IND int
[000020] -----+------ | \--* LCL_VAR byref V03 tmp2
[000022] -----+------ \--* CNS_INT int 1
And the proposition is to forward-substitute V03
into both of its uses.
This will be a little tricky because in general it is not simple to know whether it'll be profitable or not early (e. g. on targets that don't have an RMW form of ADD
with support for the [base + offset]
addressing mode it would be a pessimization), when the ordinary forward substitution is run, so it'd have to be done late (in lowering), probably as part of forming the RMW form itself. This in turn will run into the problem of generally not knowing whether the local in question has any downstream uses (perhaps early lowering or rationalization could do ref counting to aid in that).
Description
I am not sure whether this is a bug or known issue or just that my environment is out of time. Anyway, let the code talks:
Generated code:
I know little about CLR, so I may have seriously underestimated the complexity of it. But it feels like the
lea
command is bit of not clever, right?The memory layout will not change. So something likeWe does have this, so this problem is pretty limited, should be another peephole optimization I think.this.struct1.struct2.struct3.struct4.property
could always be calculated by[rax+offset]
(if the offset is not too large), isn't it?Configuration
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000 Intel Core i9-10900X CPU 3.70GHz, 1 CPU, 20 logical and 10 physical cores .NET SDK=6.0.201
Regression?
Data
Analysis
category:cq theme:optimization