dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.93k stars 4.64k forks source link

[Proposal] An API for providing compiler hints to the JIT/AOT #24593

Open tannergooding opened 6 years ago

tannergooding commented 6 years ago

Rationale

There are a number of optimizations which require complex or in-depth analysis. In many cases, these optimizations are not possible to make. This can be due to time constraints (JIT), due to requiring an assumption to be made (JIT and AOT), or due to some other reason.

However, the developer often knows what assumptions are safe to make. This can be due to the way they wrote their algorithm, some check they performed higher in the code, etc.

As such, I propose we provide a set attributes and APIs which can help the compiler (JIT or AOT) produce more optimal code.

Proposed APIs

The proposed API here is very simplistic and represents a core use-case for the new Hardware Intrinsics APIs.

There are multiple other additions that may be equally useful, but they should likely be reviewed on a case-by-case basis with their own supporting arguments.

namespace System.Runtime.CompilerServices
{
    public static class Assume
    {
        public static void Alignment(void* address, byte alignment);
    }
}

Alternative Class Names

CompilerHints was the initial suggested name of the type which would hold these APIs. However, there was some concern over whether some of the suggested APIs actually qualified as Hints, since they may be required to be correct and may result in failure or incorrect data at runtime.

The general feedback (so far) is that these are assumptions not asserts. @CarolEidt gave a good explanation of the difference here: https://github.com/dotnet/corefx/issues/26188#issuecomment-356147405

Other names suggested include:

Additional Details

All assumptions should be named. That is, there should be no generic Assume(bool cond) method exposed now or in the future. These assumptions are meant to convey an explicit optimization to the compiler and should not be left to interpretation.

Potentially as an implementation detail, preferably as part of the contract: Any assumption made by these APIs should be verified when compiling with optimizations disabled.

Example Case 1

In many high-performance algorithms the user will be reading/writing memory in 16 or 32-byte chunks by utilizing the SIMD registers available to the hardware.

It is also generally beneficial to such algorithms to ensure that their reads/writes are aligned as it ensures a read/write never crosses a cache or page boundary (which can incur a heavy performance penalty). Additionally, on older hardware (and some architectures) the unaligned read/write instruction (movups on x86) is not as fast as the aligned read/write instruction (movaps on x86) or the reg, reg/mem encoding of the various instruction may require that mem be aligned.

However, the GC does not currently support arbitrary alignments for objects and it is impossible to guarantee alignment without pinning and then checking the alignment of the memory. As such, users generally write their algorithm such that it does a single unaligned read/write, adjusts the offset so that the new address is aligned, and then operates on the rest of the data using aligned reads/writes.

In some scenarios, it may be difficult or impossible to determine that the pinned address is actually aligned (such as if an address is pinned and then passed to some other method to be processed), so the compiler may not emit the most optimal code. As such, providing an AssumeAlignment(address, alignment) that tells the compiler to emit code as if the address was aligned would be beneficial.

Example Case 2

NOTE: This is not currently proposed, but represents another annotation that having compiler support for may be useful.

In many cases, a user may write a method that is considered Pure (the method does not modify state and always returns the same input for a given output).

In simplistic cases, the compiler may perform constant folding or inlining and the actual computation may not be done. However, in more complex cases the compiler may need to do more in-depth analysis which may prevent it from caching the return value or inlining the method.

As such, providing a PureAttribute (similarly to the Contract.PureAttribute, but actually consumed by the compiler) would be useful as it would allow the compiler to cache the return value and optimize away subsequent calls for a given input.

Existing IL Support for skipping fault checks

There are currently three concrete fault checks which the CLI specifies can be skipped using the no. prefx.

This prefix indicates that the subsequent instruction need not perform the specified fault check when it is executed. The byte that follows the instruction code indicates which checks can optionally be skipped. This instruction is not verifiable.

The prefix can be used in the following circumstances:

0x01: typecheck (castclass, unbox, ldelema, stelem, stelem). The CLI can optionally skip any type checks normally performed as part of the execution of the subsequent instruction. InvalidCastException can optionally still be thrown if the check would fail.

0x02: rangecheck (ldelem., ldelema, stelem.). The CLI can optionally skip any array range checks normally performed as part of the execution of the subsequent instruction. IndexOutOfRangeException can optionally still be thrown if the check would fail.

0x04: nullcheck (ldfld, stfld, callvirt, ldvirtftn, ldelem., stelem., ldelema). The CLI can optionally skip any null-reference checks normally performed as part of the execution of the subsequent instruction. NullReferenceException can optionally still be thrown if the check would fail.

The byte values can be OR-ed; e.g.; a value of 0x05 indicates that both typecheck and nullcheck can optionally be omitted.

Although these are supported by IL, high level compilers may not emit them. It is potentially useful to expose Assume.IsType, Assume.InRange, and Assume.NotNull checks so that all languages (which compile to IL) can benefit from skipping these fault checks (without having to undergo an IL rewriting step).

Additional Notes

There are many other potential hints or attributes which can be useful and for which the class could be extended.

Some of these represent assumptions that a higher level compiler (i.e. C#) cannot make (due to not being able to do things like cross-assembly optimizations, or being required to emit certain IL).

Joe4evr commented 6 years ago

always returns the same input for a given output

πŸ€”

More seriously, I've occasionally been thinking about something like this, but I have no idea if the scenarios I have in mind are even that much of a perf impact.

tannergooding commented 6 years ago

I think its the type of optimization where a big impact depends on your API. For example, a method which is pure but complex to compute, such as sin, cos, or tan can provide big impact.

But, a thousand tiny improvements can still add up to a lot as well, so it can also be useful for small APIs.

tannergooding commented 6 years ago

FYI. @fiigii, @4creators, @mikedn, @CarolEidt, @jkotas

I've proposed this feature as a general way for developers to provide hints to the JIT/AOT about their code.

The only API for the feature I've proposed at this time would give users a way to tell the JIT to emit movaps instructions (or the reg/mem encoding for an instruction on legacy hardware, such as is used by the Benchmark Games -- CC @danmosemsft, @eerhardt)

Otherwise, even in the case where the user has pinned a managed array, and offset their ptr appropriately, it is still expensive for the JIT to determine that the logic will result in the code being always aligned and that it is safe to emit the better code.

4creators commented 6 years ago

This feature is IMO long overdue. It can be helpful in a lot of scenarios not only for HW intrinsics. I would even expand this proposal with RuntimeHelpers.ReoptimizeMethod.

RuntimeHelpers.ReoptimizeMethod could use all hints which are statically available with HintAttributes but may provide even wider functionality i.e. ask for given tier of tiered optimizations, ask for ISA target etc.

For discussion see https://github.com/dotnet/coreclr/issues/15427#issuecomment-351794198 and https://github.com/dotnet/coreclr/issues/15522

CarolEidt commented 6 years ago

I would not consider these two cases to fall into the category of compiler "hints". I would expect a hint to be something that the compiler could use to guide its choices (e.g. whether to inline), but which would have no correctness requirement. For the alignment case, at least the worst that could happen is that the program would fault if the alignment were not correct (and the developer would then know they'd got it wrong). For the "Pure" case, it could get completely wrong results, and be much more difficult to debug. That said, I think these are worth considering - we should just call them something other than "hints".

tannergooding commented 6 years ago

we should just call them something other than "hints".

Fair enough. Now to come up with a better name πŸ˜„

4creators commented 6 years ago

My proposals πŸ˜„ would be than:

CompilerConfig, CompilationConfig, CompilerSetting, CompilationSetting - always with Attribute at the end.

tannergooding commented 6 years ago

always with Attribute at the end.

There also needs to be a name for the static class which is used to provide information inline with code (such as for a local, where attributes aren't allowed).

jkotas commented 6 years ago

Similar proposal: https://github.com/dotnet/coreclr/issues/2725

tannergooding commented 6 years ago

Similar proposal: dotnet/coreclr#2725

πŸ‘.

I would opt for named assumptions over a general Assume, however. At the very least it makes the intent or goal of the assumption clearer (Assume(p & 0x1F) could mean any number of things, but AssumeAlignment(p, 32) makes it clear that we care about alignment and that the compiler can directly interpret it as such). I think named assumptions also covers @CarolEidt's comment about capturing the expected behavior and being able to validate it.

I do think putting them in Unsafe makes sense.

tannergooding commented 6 years ago

I've updated the OP to suggest using Unsafe, rather than CompilerHints.

4creators commented 6 years ago

Yet, I would still allow general Unsafe.Assume for predefined cases like in comment by @GSPP:

This can be tremendously useful because it's a way turn off any JIT safety. x != null turns off a null check. index >= 0 && index < array.Length turns off a bounds check

tannergooding commented 6 years ago

Yet, I would still allow general Unsafe.Assume for predefined cases

The point would be to provide named assumptions (e.g. AssumeNotNull and AssumeInRange) for any predefined cases and to not provide a general Assume.

A general Assume would look something like void Assume(bool condition), a user calling it would end up writing Assume(val != null), and the IL generated would be something like:

ldarg.1
ldnull
cgt.un
call void Unsafe::Assume(bool)

This requires the JIT to support and itself assume that the above IL sequence (and any other similar IL sequences that may be emitted by other compiles) meant "I am not null". It also requires the compiler to not optimize such a check away (C# is adding non-nullable types, whose to say they don't optimize the check to ldc.i4.1 and make it so the JIT doesn't know what is being assumed).

However, an explicit AssumeNotNull check makes the intent obvious to the JIT. The signature might look something like void AssumeNotNull<T>(T val) where T : class and would be called as AssumeNotNull(val), the IL generated would be something like:

ldarg.1
call void Unsafe::AssumeNotNull<TypeOfVal>(!!0)

A high level compiler (such as C#) also has no way of optimizing this call away itself so there is no worry about the JIT not being able to determine what the user was wanting.

4creators commented 6 years ago

This requires the JIT to support and itself assume that the above IL sequence (and any other similar IL sequences that may be emitted by other compiles) meant "I am not null"

Got it. It is OK than as simpler solution from perspective of Jit support.

ltrzesniewski commented 6 years ago

The current purpose of the System.Runtime.CompilerServices.Unsafe class is to perform various memory operations (read/write/copy/zero etc) or casts that the language wouldn't otherwise allow.

Adding assumptions there doesn't look like a very good fit to me. It rather seems to be a violation of the single responsibility principle.

I'd dedicate a new class for this, here are some naming ideas:

tannergooding commented 6 years ago

here are some naming ideas

Thanks for the suggestions! I'll update the OP to include a list of suggested names sometime later tonight.

Based on the discussions here, and a couple I've had offline, the general idea of the proposal looks to be sound (we want a way to suggest optimizations to the compiler and we want those to be named/verifiable).

The actual name of the class that holds these methods will likely undergo some discussion during the API review session (which will happen sometime after we mark this as ready-for-review).

benaadams commented 6 years ago

If they are to be trusted rather than guidance it should be Assertion rather than Assumption e.g. UnsafeAssertions.IsNotNull(foo)

ltrzesniewski commented 6 years ago

@benaadams when you say Assertion, I immediately think of System.Diagnostics.Debug.Assert, whose role is basically the opposite of this proposal... That's a dangerous confusion to make.

CarolEidt commented 6 years ago

Although I think that Assertion is probably a more accurate description, its usage has generally been in conjunction with conditions that are expected to be validated by a language/compiler/runtime, rather than something that is expected to be assumed without verification (except possibly by a validation tool). I would be more in favor of one of the Assume variants above.

tannergooding commented 6 years ago

@CarolEidt, is your recommendation then that we do not perform any validation of the assumption when optimizations are disabled then?

Based on your statement on the CoreCLR issue (https://github.com/dotnet/coreclr/issues/2725#issuecomment-173028054), I had taken it that we should validate the condition under debug (however, that was 2 years ago and things may have changed).

CarolEidt commented 6 years ago

@tannergooding - thanks for reminding me of that. I still think it would be useful and probably desirable to validate the condition under debug. However, even in that context, I don't feel that the verb "assert" is quite strong enough; it says "I claim this is true". To me, "assume" says "I claim this is true and you (the compiler/JIT/runtime) can take actions based on that claim".

4creators commented 6 years ago

IMO Assume.NotNull(foo) has a closest semantic meaning to and best expresses actions taken by both developer and compiler. In case we show the fully qualified name it is even more self documenting:

System.Runtime.CompilerServices.Assume.NotNull(foo)

tannergooding commented 6 years ago

I'm going to tag @jaredpar and @MadsTorgersen explicitly here as well.

This will primarily be intended for the lower compilers (JIT/AOT, etc), but it would be nice to see if the higher level compilers (C#, VB, F#, etc) have any requirements or scenarios that this might prevent/harm/etc, or if there are any scenarios for which this would be advantageous (i.e. Assume.NotNull could, theoretically, be taken advantage of for non-nullable refs)

tannergooding commented 6 years ago

Updated the OP with:

tannergooding commented 6 years ago

Tagging @joperezr, @AlexGhiondea as area owners.

Is there any other feedback here before I mark as api-ready-for-review?

jkotas commented 6 years ago

You should update the proposed API at the top post to what you want folks to review.

tannergooding commented 6 years ago

You should update the proposed API at the top post to what you want folks to review.

πŸ‘, Updated it to be Assume.Alignment(void* address, byte alignment) instead of Unsafe.Alignment (thought I had done this already)

Rest of the OP looks to be as intended.

CarolEidt commented 6 years ago

Is there any other feedback here before I mark as api-ready-for-review?

Looks good to me.

jkotas commented 6 years ago

If this is just about Assume.Alignment, I think we should have a proof of concept that shows the performance benefits on something semi-real first. I do not think there will be a problem with getting the API approved if there are data to demonstrate that it is useful.

FWIW, what I have been hearing is that the performance penalty of the instructions that handle misaligned access is very small on modern platforms and that it is not something to really worry about.

tannergooding commented 6 years ago

FWIW, what I have been hearing is that the performance penalty of the instructions that handle misaligned access is very small on modern platforms and that it is not something to really worry about.

There is basically no difference between the movaps and movups (aligned vs unaligned) instructions on modern hardware (this is roughly Nov 2008 onward, as the improvements came with the Nehalem micro-architecture).

However, on non-AVX hardware (anything prior to Sandy Bridge, which came out Jan 2011) you cannot use the reg, mem encoding for most instructions (as the mem operand must be aligned).

Any machine running on post Sandy Bridge (post Bulldozer for AMD, which was released Oct 2011) micro-architecture can freely use the VEX encoding (reg, reg, mem) with any alignment.

However, even on modern hardware, there is a non-trivial performance hit if you read unaligned data that crosses a cache line or page boundary, so it is important to ensure reads/writes (outside the first) are aligned anyways.

Additionally, the Benchmark Game performance tests (CC. @danmosemsft, @eerhardt) are run on legacy hardware (specifically a Q6600 from the Kentsfield micro-architecture) for which the movups instruction has a large performance penalty (as compared to using the movaps instruction).

CC. @fiigii as he may be able to provide more details or correct any mistakes I made with the timelines.

benaadams commented 6 years ago

However, even on modern hardware, there is a non-trivial performance hit if you read unaligned data that crosses a cache line or page boundary, so it is important to ensure reads/writes (outside the first) are aligned anyways.

Affects caller responsibility, not emitted instruction or Jit assumption?

So is only pre-2012 CPUs it would affect?

tannergooding commented 6 years ago

Affects caller responsibility, not emitted instruction or Jit assumption?

Yes, this one is caller responsibility. It doesn't impact the instructions emitted (on modern CPUs)

So is only pre-2012 CPUs it would affect?

Yes, it would only affect pre-2012 x86 CPUs. I believe ARM has only gotten better unaligned support more recently (@sdmaclea might know when that came about).

4creators commented 6 years ago

Yes, it would only affect pre-2012 x86 CPUs.

I am not sure that this is the case, as Intel pushes several processor lines with cut down ISA which comprises only SSE up to SSE42 and some smaller set of instructions. Even today you can still buy brand new Pentium and Celeron launched in Q2 2017 without AVX:

https://ark.intel.com/products/97460/Intel-Pentium-Processor-G4620-3M-Cache-3_70-GHz - Kaby Lake https://ark.intel.com/products/122697/Intel-Pentium-Processor-4415Y-2M-Cache-1_60-GHz - Kaby Lake https://ark.intel.com/products/122825/Intel-Celeron-Processor-G3930TE-2M-Cache-2_70-GHz - Kaby Lake

There are no Caffe Lake Pentium or Celerons released yet.

mikedn commented 6 years ago

What exactly is the purpose of assume aligned? Don't we already have intrinsics such as LoadAligned?

tannergooding commented 6 years ago

There are several scenarios (such as having a Vector128<T>[]) where a user may be working with Vector128<T> data that does not require an explicit Load to reference.

Intrinsics are also not the only place where it can be useful. We currently have several other ways of reading/writing large chunks of data (a few of which will inline to SIMD instructions as well).

mikedn commented 6 years ago

There are several scenarios (such as having a Vector128[])

How can you "assume" 16 byte alignment for an array? The GC doesn't guarantee alignment larger than 8 bytes.

tannergooding commented 6 years ago
  1. You pin it.
  2. You determine the alignment
  3. If unaligned, you do a single unaligned read/write
  4. You offset to be aligned
  5. You operate on the rest of the data as aligned
mikedn commented 6 years ago

If you do all that then you can surely simply use LoadAligned as well.

We really should be careful with this "assume" stuff. We need to ensure that:

tannergooding commented 6 years ago

If you do all that then you can surely simply use LoadAligned as well.

Which still doesn't solve the scenario for anyone using Unsafe.Read, Unsafe.Write, or Vector<T>.

We really should be careful with this "assume" stuff.

I agree, any assumptions we add need to have a clear goal/scenario and some level of data supporting why it should exist.

I know on memory alignment, I have seen several things of benchmark data (such as the benchmark games) that indicates we don't perform great on machines without AVX support.

we do not add high risk assumptions (e.g. anything that would turn off array range checks) without having high value use cases for them

I'm not convinced turning off range checks is a high risk assumption. The IL already supports the no. prefix, which can turn off type checking, null checking, and bounds checking (or at least that is what it is documented to do)

Users can already bypass it using unsafe code, and can likely bypass it by writing their own IL. Having a compiler assumption for it just provides an easy way for users to express the same thing without having to deal with pointers and without having to do a IL rewriting pass.

mikedn commented 6 years ago

Which still doesn't solve the scenario for anyone using Unsafe.Read, Unsafe.Write, or Vector.

If Vector<T> has a problem with alignment then it's probably best to fix it in Vector<T>. Adding LoadAligned to it makes more sense than alignment assumptions.

The IL already supports the no. prefix, which can turn off type checking, null checking, and bounds checking (or at least that is what it is documented to do)

Support for those prefixes is optional. The .NET JIT ignores them.

Users can already bypass it using unsafe code.

Yes, and that makes the addition of another way to shoot yourself in the foot more questionable. Especially if the new way is easier to use and easier to get wrong.

sdmaclea commented 6 years ago

ARM has only gotten better unaligned support more recently (@sdmaclea might know when that came about).

ARM alignment performance would be vendor specific. If alignment faults were enabled (they are not) an exception would be very expensive. Otherwise cache line and page crossing penalties are real, but using aligned code is rarely a win. Large block copies are the typical exception.

I did not read the full thread. Some observations.

tannergooding commented 6 years ago

Seems like the assumptions should throw an exception if incorrect. This would encourage the assuming code to be inside a try/catch block.

Assumptions will assert the condition and fail the program if optimizations are disabled. try/catch blocks could potentially harm performance or other optimizations more than missing the assumption would.

Proposed interface marks beginning, but not end of an assumption. Would something like a using assumption block be better.

Other compilers consider assumptions to be correct until the input is modified. I would think the same logic would be applied here.

tannergooding commented 6 years ago

If Vector has a problem with alignment then it's probably best to fix it in Vector. Adding LoadAligned to it makes more sense than alignment assumptions.

Vector<T> isn't the only thing today that has an issue with alignment, and more scenarios are likely to come about in the future as well.

Providing a general purpose mechanism that allows the compiler to detect and handle alignment is likely better than adding customized support to several types. A general, cross-architecture, mechanism also allows 3rd party authors to achieve the same thing without taking a dependency on one of those types or without requiring them to use hardware intrinsics (which might not be a good option since they are architecture specific).

Yes, and that makes the addition of another way to shoot yourself in the foot more questionable. Especially if the new way is easier to use and easier to get wrong.

These will, arguably, be less "shoot yourself in the foot". Validation when optimizations are disable already make this better than any of the alternatives. Having the assumptions additionally be named and having them work with "safe" code will also make it easier to understand the intent and to debug the issue.

We are a JIT compiler, so we have a limited time to do the analysis, and even with an AOT compiler, some analysis is complex and takes a long time to do. As such, we have to accept the fact that there are some optimizations the JIT cannot do without outside input (i.e. the developer).

If I do checks on my inputs at the public entry point and pass those inputs, unmodified, to some other private method which operates on the data, I should be able to tell the compiler "I already checked this, please don't check it again" and the compiler should (preferably) respect that.

mikedn commented 6 years ago

Vector isn't the only thing today that has an issue with alignment

Other examples?

We are a JIT compiler, so we have a limited time to do the analysis, and even with an AOT compiler, some analysis is complex and takes a long time to do. As such, we have to accept the fact that there are some optimizations the JIT cannot do without outside input (i.e. the developer).

That's basically a lot of hand-waving. You have no idea what a JIT can or can't do in a given amount of time. You have no idea if the current JIT is efficient or not. You have no idea if the fact that the current JIT fails to perform some useful optimizations because it doesn't have time or because it's basically a pile of crap. You have no idea if the JIT could actually make effective use of the proposed assumptions. You have no idea if the performance improvements you may get from implementing such assumptions are significant enough to be worth the risk.

CarolEidt commented 6 years ago

@mikedn - although there's a lot we don't know about what the JIT could or couldn't do, especially if it were engineered as well as it could possibly be, I think there are a couple of things we can say:

That said, you make a very good point that we should be doing proof-of-concept work to validate the benefit of any such hints that we might propose for .NET.

mikedn commented 6 years ago

it is a near certainty that the JIT can't do as much as we would like it to, even with a lot more investment in throughput and optimizations

Yes, of course. But one of the problems with these assumptions is that the JIT may not be able to handle them very well for the same reasons it does not handle well other optimizations - lack of time, problematic design and implementation.

I've ran into such issues already. For example, I added the (uint)i < (uint)array.Length trick and then discovered that it doesn't quite work when used in loops. And it's possible to make it work but it's not trivial because the range check phase is poorly written. Not to mention that recently an attempt at using this trick uncovered a pretty nasty bug in the JIT related to range check elimination. And that fixing the bug resulted in the JIT being more conservative around range check elimination.

And when people try to use "assume" in their code and find that the JIT doesn't perform the desired optimizations what do we do? Go back to the drawing board and introduce new ways to fill their needs? While not necessarily bad I'm not sure this is a good idea.

AndyAyersMS commented 6 years ago

I'm sympathetic to this kind of proposal but also have seen this kind of thing go wrong in many ways. So I'd prefer it if we had an actual end to end implementation that we could evaluate.

jkotas commented 6 years ago

@joperezr There is no point in sending this to API review until there is a proof of concept that demonstrates that this API is actually useful.

tannergooding commented 6 years ago

I have a good use case for this. See https://github.com/dotnet/coreclr/pull/16095#issuecomment-361634502 for better details.

Basically, we cannot fold LoadAligned on newer hardware because it could mask an AccessViolation that would otherwise occur.

Assume.Aligned would allow the JIT to perform the fold anyways (and would avoid the need for the user to maintain two versions of the algorithm, one that uses LoadAligned and one that uses just Load, if they want to support hardware without VEX support).

redknightlois commented 6 years ago

@AndyAyersMS @CarolEidt This is an actual assumption that is proven in practice to help even on highly optimized C++ compilers: https://github.com/dotnet/coreclr/issues/6024 I for once have several use cases for this one, so if you want to code a prototype we have actual code that you can benchmark.