dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.67k forks source link

Property getter not getting inlined #9632

Open Barsonax opened 6 years ago

Barsonax commented 6 years ago

I ran into a performance issue when I used a property for the HCost + GCost field here. One would expect these to be inlined as this makes a big difference in performance in my case and the body is really small.

Without the property the perf numbers are this (benchmarks can be run using this): image

But if I replace the HCost + GCost code with a FCost property that returns HCost + GCost I get this: image

Then I decided to look at the generated assembly and noticed that for the Sortup method (which plays a crucial role in the perf of the pathfinding) that without the property there are 2 call instructions and with the FCost property there are 4 call instructions (the 2 extra calls here are the calls to the FCost property I believe). It also seems that due to inlining further optimizations are possible which could explain the huge difference in benchmark times as I don't think method call overhead alone is responsible for this.

It seems that the compiler inlines everything except the properties for some reason. Manually making a getter method and adding [MethodImpl(MethodImplOptions.AggressiveInlining)] did not make a difference here. Why is the compiler skipping this optimization when it makes so much of a difference here?

Since iam not that good with assembly code (read: I have zero experience) you can find the generated assembly here:

category:cq theme:inlining skill-level:expert cost:small

mikedn commented 6 years ago

As far as I can tell you're using .NET Framework x86. The JIT compiler used by that is not the open sourced RyuJIT but an older, legacy JIT. AFAIR that JIT sometimes does have issues inlining trivial methods. However, it's very unlikely that someone will go and fix that.

It would be useful if you could re-run benchmarks using .NET Framework/Core x64 or .NET Core x86.

Barsonax commented 6 years ago

As far as I can tell you're using .NET Framework x86

But it says ryujit here: image

Am I misunderstanding the config in benchmark dotnet? Do I have to change something in the config here

Also for the assembly I saw clrjit.dll was loaded and compatjit.dll is not loaded.

EDIT: on 64 bit this behavior does not happen and the whole algorithm is even faster which makes me believe it really wasnt running ryujit before. Very weird since I did put that in the config. Might be a bug in benchmarkdotnet. Is there any failsafe way to check what JIT you are really running?

mikedn commented 6 years ago

But it says ryujit here

Yeah, I saw. I'm not very familiar with the output of Benchmark.DotNet but it also says "32 bit LegacyJIT" at the end of the line.

Barsonax commented 6 years ago

Yeah, I saw. I'm not very familiar with the output of Benchmark.DotNet but it also says "32 bit LegacyJIT" at the end of the line.

Jup noticed that too. Very confusing so I made a issue about that here.

So the only way to get ryujit is to use .NET Framework/Core x64 or .NET Core x86? There is not some magic switch somewhere to always use it? I cannot use .NET Core (yet) and I don't want to force everything to be 64 bit.

I do see a small regression in performance for the shorter paths: image

In my case it doesnt really matter a whole lot as its still very fast (and might purely be because of 64 bit causing more bytes to be allocated?) but just a heads up. Now we are comparing anyway overal its quite alot faster than the legacyJIT so nice job on that :). Having a property or field doesnt matter anymore so consider this issue closed.

mikedn commented 6 years ago

So the only way to get ryujit is to use .NET Framework/Core x64 or .NET Core x86? There is not some magic switch somewhere to always use it?

Yep, there's no publicly available RyuJIT for .NET Framework x86.

and might purely be because of 64 bit causing more bytes to be allocated?

Might be, though the amount of memory allocated is very small...

Hard to say, you'd need an x86 RyuJIT to tell for sure.

Barsonax commented 6 years ago

Reopening this issue as it also seems to happen when using x64 RyuJIT but in a different scenario. When the AstarNode struct is like this

The perf numbers are like this: image

However when you change the HCost field to properties the perf numbers get significantly slower: image

EDIT: most of the difference seems to be coming from turning HCost to a property EDIT2: turning HCost to a property will cause CompareTo to be not inlined anymore. AgressiveInlining does fix it though.

mikedn commented 6 years ago

EDIT2: turning HCost to a property will cause CompareTo to be not inlined anymore. AgressiveInlining does fix it though.

Yes, that's exactly what happens. Unfortunately RyuJIT thinks that CompareTo is too large to inline when HCost is changed to a property.

I suppose @AndyAyersMS may want to take a look at this. I posted the relevant part of a JIT dump of SortUp here: https://gist.github.com/mikedn/875b00655cf77a8fb19aa75da25cc6cd

It's kind of unfortunate that even today the JIT has problems inlining trivial property getters/setters.

AndyAyersMS commented 6 years ago

RyuJit makes profitability assessments solely by scanning the IL of the potential inlinee. This analysis isn't transitive -- it does not resolve tokens in the callee's IL or look through calls that the inlinee makes to see if they are also good inline candidates. So the jit has at best a limited and conservative view of the profitability of the inlinee.

Heuristics that look more deeply and more comprehensively are possible but given the throughput constraints the jit faces, these can only be used selectively. Tiered jitting might allow time for more careful examination.

Along the same lines, the fact that tiering has identified a method as frequently called might allow the jit to simply inline more aggressively and not bother with the more detailed analysis. I started down this path over in dotnet/coreclr#15046 and will come back to it fairly soon.

But those possible improvements are a ways off, especially for desktop CLR. In the meantime, you can always use AggressiveInlining to tell the jit things that it arguably should be able to figure out for itself.

AndyAyersMS commented 6 years ago

And if you're curious about attempts to improve the current IL based heuristics, you might find this writeup interesting.

mikedn commented 6 years ago

As a side note: .NET's Compare "design" is rather unfortunate performance wise. I haven't looked close enough at your code to tell if you have a choice or not (and there's a pretty good chance you don't due to how pervasive Compare is in .NET). But if you do have a choice and want best performance then you could try to use a "Less" predicate instead of a "Compare" predicate, it's quite a bit cheaper.

Barsonax commented 6 years ago

As a side note: .NET's Compare "design" is rather unfortunate performance wise.

Did a quick test as the only place where the CompareTo method is used currently is in the maxheap (purely to get the next 'best' node for pathfinding). Iam not interested if a node has a 'equal' value just need to know if its better or not. In early testing I saw a +20% difference so nice find :). Now its like this:

if (FCost == other.FCost)
{
       return HCost < other.HCost;
}
return FCost < other.FCost;