Examples where heavy intrinsics usage runs into internal jit limits on optimization

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.12k stars 4.7k forks source link

Examples where heavy intrinsics usage runs into internal jit limits on optimization #11905

Open AndyAyersMS opened 5 years ago

AndyAyersMS commented 5 years ago

Tracking issue for cases where heavy intrinsics usage leads to poor optimization because methods hit various internal jit limits.

dotnet/runtime#11744 inlining stops because the inlining budget was exceeded (partially addressed by dotnet/coreclr#21893)
dotnet/runtime#11903 inlining stops after hitting "too many locals" limit
aspnet/AspNetCore#7724 inlining stops after hitting "too many locals" limit (and no/few hw intrinsics)

category:cq theme:inlining skill-level:expert cost:medium

saucecontrol commented 5 years ago

I closed dotnet/runtime#11903 because it's being addressed in a different way. However, absent the regression caused by the HWIntrinsics API change, that example was still very close to the JIT throttling limits without being absurdly complex. I wanted to bring over @AndyAyersMS comment from over there so it doesn't get lost, as it would be a good compromise solution for these cases.

The limits are there to prevent jit algorithms from taking up too much memory, too much time, or both. Perhaps we could tie increasing the limits into AggressiveOptimization so we have a better idea that the performance of a method is deemed critical and so optimizing it is worth the extra jit time and memory.

benaadams commented 4 years ago

@AndyAyersMS will this have become more problematic now Arm paths are being added, or are the .IsSupported paths dropped early?

AndyAyersMS commented 4 years ago

I think we're ok. Early pruning helps. Also, the jit will create temps for inlinee args and locals lazily as it is importing the inlinee, so increasing the number of locals in a method (say because C# now sees much more code) should not be problem, provided only a subset of them can be reached from any particular architecture.

@kunalspathak did some checking to make sure that adding arm specialization to methods that already has xarch specialization didn't cause any changes in the xarch code.

SingleAccretion commented 3 years ago

So that this doesn't get lost. From #48669:

We may want to revaluate this limit. Last time we looked (~5 years ago) there were very few methods that came near. But perhaps things have changed.

I have collected some quick data from the PMI diffs of the shared framework (for win-x64). It looks like the situations is still that most methods have a relatively small number of locals.

Locals        Methods 
0    - 100  : 352956 : 99.230%
100  - 200  : 2136   : 00.601%
200  - 300  : 382    : 00.107%
300  - 400  : 138    : 00.039%
400  - 500  : 31     : 00.009%
500  - 2334 : 51     : 00.014%