Closed EgorBo closed 14 hours ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
@EgorBot -arm64 -profiler
using BenchmarkDotNet.Attributes;
public class Bencha
{
static object obj = new MyClass();
[Benchmark]
public void Bench()
{
if (obj is MyClass myClass)
myClass.DoWork();
}
}
public class MyClass {
public virtual void DoWork() {}
}
@EgorBot -arm64 -profiler
so seems this is 2X slower looking at the benchmark results?
so seems this is 2X slower looking at the benchmark results?
I imagine this is too small to actually be measured by BDN and is likely largely dependent on the hardware and surrounding code. We should probably loop in the folks at ARM for official guidance (cc. @TamarChristinaArm).
My guess is that some hardware will "fuse" neighboring movz/movk
into a constant on the backend, while others will actually incur construction cost. On all hardware it will likely impact decoding bandwidth.
Inversely loading from method local memory has its own downsides, since that memory page is marked "executable". Ignoring that downside, however, it will likely be in the L1 data cache and incur an approx 4 cycle load time, unless the hardware has an optimization to cache such recent loads in spare registers from the register file (as some x64 chips do).
My guess is its mostly a wash and the right choice comes down to whether we're optimizing for size.
@EgorBot -arm64 -profiler
using BenchmarkDotNet.Attributes;
public class Bencha
{
static object obj = new MyClass();
[Benchmark]
public void Bench()
{
if (obj is MyClass myClass1)
myClass1.DoWork();
if (obj is MyClass myClass2)
myClass2.DoWork();
if (obj is MyClass myClass3)
myClass3.DoWork();
if (obj is MyClass myClass4)
myClass4.DoWork();
if (obj is MyClass myClass5)
myClass5.DoWork();
if (obj is MyClass myClass6)
myClass6.DoWork();
}
}
public class MyClass {
public virtual void DoWork() {}
}
Yeah I am not planning to go further with this PR, but it might be interesting to see how performance is different. E.g. for jump stubs we do ldr
instead of movz/k
Experiment: I am just curious about the size wins and performance impact
SPMI diffs:
Although, the diffs don't take data section into account (8 bytes + potential alignment but with ability to use a single shared constant for multiple places in a method) => on average it's still a size win.
Unfortunately, it makes not much sense for R2R/NAOT since those mostly use relocatable constants and rarely need raw 64bit constants.