A mechanism to relax exception checks such that inlined method calls are at least as fast as manual inlining should be supported.

Rationale

Under the CIL specification, the runtime is normally only allowed to perform certain optimizations and must preserve side-effects and exceptions generated by a thread in the order that they appear (I.12.6.4).

This means that, when inlining methods, it is not as effecicient as manual inlining when you are passing an argument that may potentially have side effects (such as accessing the element of an array). This also means that instruction reordering may not be allowed.

However, it is often desirable that the JIT produce code that is more performant over code that is more "compliant"

As such, I propose a mechanism be exposed to relax various existing compilation requirements such that the desired code gen can be achieved.

Example

Under the normal restrictions, the following code:

int Select(bool condition, int trueValue, int falseValue)
{
    return condition ? trueValue : falseValue;
}

int MyMethod(bool condition, int[] a, int[] b)
{
    return Select(condition, a[5], b[6]);
}

cannot be transformed to be equivalent to:

int MyMethod(bool condition, int[] a, int[] b)
{
    return condition ? a[5] : b[6];
}

This means that, even if the Select call is inlined, it will fail if condition is true and b[6] is invalid (either b is null or 6 is out of range) or when condition is false and a[5] is invalid. However, the manually inlined code does not have this problem.

Proposal

I believe that this functionality is allowed by the existing runtime specification under the pretext of E-relaxed methods (see below).

The specification gives examples, and even declares members in the BCL, that should allow this functionality today. However, we do not expose the declared members and therefore do not support the functionality.

As such, I propose we expose the missing System.Runtime.CompilerServices.CompilationRelaxations values declared in ECMA TR-84 and update the runtime to support them.

This will be beneficial both to JIT'd code as well as to AOT'd code that attempts to remain "compliant"/"compatible".

The proposed members are:

/// Indicates whether instruction checking is strictly ordered or relaxed, and whether strings
/// are interned. The flags come in complementary pairs. Setting neither flag of a pair
/// indicates that the corresponding characteristic should be left unchanged. Setting both
/// bits is an error that is detected by the constructor for
/// System.Runtime.CompilerServices.CompilationRelaxationsAttribute.
public enum CompilationRelaxations
{
    /// Indicates that literal strings are interned.
    StringInterning = 0x4,

    /// Indicates that literal strings are not interned; currently only noticed when set for
    /// Assemblies.
    NoStringInterning = 0x8,        // Supported today

    /// Indicates that instruction checking for NullReferenceException and access violations is
    /// strictly ordered.
    StrictNullReferenceException = 0x10,

    /// Indicates that instruction checking for NullReferenceException and access violations is
    /// not strictly ordered (that is, it is relaxed).
    RelaxedNullReferenceException = 0x20,

    /// Indicates that instruction checking for InvalidCastException is strictly ordered.
    StrictInvalidCastException = 0x40,

    /// Indicates that instruction checking for InvalidCastException is not strictly ordered
    /// (that is, it is relaxed).
    RelaxedInvalidCastException = 0x80,

    /// Indicates that instruction checking for IndexOutOfRangeException, RankException,
    /// and ArrayTypeMismatchException is strictly ordered.
    StrictArrayExceptions = 0x100,

    /// Indicates that instruction checking for IndexOutOfRangeException, RankException,
    /// and ArrayTypeMismatchException is not strictly ordered (that is, it is relaxed).
    RelaxedArrayExceptions = 0x200,

    /// Indicates that instruction checking for OverflowException and DivideByZeroException
    /// is strictly ordered.
    StrictOverflowExceptions = 0x400,

    /// Indicates that instruction checking for OverflowException and DivideByZeroException
    /// is not strictly ordered (that is, it is relaxed).
    RelaxedOverflowExceptions = 0x800
}

Important Spec Sections

I.12.6.4 Optimization

Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects. (Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.) Volatile operations are specified in §I.12.6.7. There are no ordering guarantees relative to exceptions injected into a thread by another thread (such exceptions are sometimes called “asynchronous exceptions” (e.g., System.Threading.ThreadAbortException).

[Rationale: An optimizing compiler is free to reorder side-effects and synchronous exceptions to the extent that this reordering does not change any observable program behavior. end rationale]

[Note: An implementation of the CLI is permitted to use an optimizing compiler, for example, to convert CIL to native machine code provided the compiler maintains (within each single thread of execution) the same order of side-effects and synchronous exceptions.

This is a stronger condition than ISO C++ (which permits reordering between a pair of sequence points) or ISO Scheme (which permits reordering of arguments to functions). end note]

The first part explains the normal limitations and basically says (to my understanding) that instructions with side-effects (including throwing exceptions) may not be reordered with regard to each other.

Optimizers are granted additional latitude for relaxed exceptions in methods. A method is E-relaxed for a kind of exception if the innermost custom attribute System.Runtime.CompilerServices.CompilationRelaxationsAttribute pertaining to exceptions of kind E is present and specifies to relax exceptions of kind E. (Here, “innermost” means inspecting the method, its class, and its assembly, in that order.)

A E-relaxed sequence is a sequence of instructions executed by a thread, where

Each instruction causing visible side effects or exceptions is in an E-relaxed method.

The sequence does not cross the boundary of a non-trivial protected or handler region. A region is trivial if it can be optimized away under the rules for non-relaxed methods.

Below, an E-check is defined as a test performed by a CIL instruction that upon failure causes an exception of kind E to be thrown. Furthermore, the type and range tests performed by the methods that set or get an array element’s value, or that get an array element’s address are considered checks here.

A conforming implementation of the CLI is free to change the timing of relaxed E-checks in an E-relaxed sequence, with respect to other checks and instructions as long as the observable behavior of the program is changed only in the case that a relaxed E-check fails. If an E-check fails in an E-relaxed sequence:

The rest of the associated instruction must be suppressed, in order to preserve verifiability. If the instruction was expected to push a value on the VES stack, no subsequent instruction that uses that value should visibly execute.

It is unspecified whether or not any or all of the side effects in the E-relaxed sequence are made visible by the VES.

The check’s exception is thrown some time in the sequence, unless the sequence throws another exception. When multiple relaxed checks fail, it is unspecified as to which exception is thrown by the VES.

[Note: Relaxed checks preserve verifiability, but not necessarily security. Because a relaxed check’s exception might be deferred and subsequent code allowed to execute, programmers should never rely on implicit checks to preserve security, but instead use explicit checks and throws when security is an issue. end note]

[Rationale: Different programmers have different goals. For some, trading away precise exception behavior is unacceptable. For others, optimization is more important. The programmer must specify their preference. Different kinds of exceptions may be relaxed or not relaxed separately because different programmers have different notions of which kinds of exceptions must be timed precisely. end rationale]

[Note: For background and implementation information for relaxed exception handling , plus examples, see Annex F of Partition VI. end note]

The second part is a bit more in-depth and even gives examples in Annex F (see below). The important bit is probably the section on E-checks and how the timing is allowed to be changed.

VI.Annex F Imprecise faults

Some CIL instructions perform implicit run-time checks that ensure memory and type safety. Originally, the CLI guaranteed that exceptions were precise, meaning that program state was preserved when an exception was thrown. However, enforcing precise exceptions for implicit checks makes some important optimizations practically impossible to apply. Programmers can now declare, via a custom attribute, that a method is “relaxed”, which says that exceptions arising from implicit run-time checks need not be precise.

Relaxed checks preserve verifiability (by preserving memory and type safety) while permitting optimizations that reorder instructions. In particular, it enables the following optimizations:

Hoisting implicit run-time checks out of loops.

Reordering loop iterations (e.g., vectorization and automatic multithreading)

Interchanging loops

Inlining that makes an inlined method as least as fast as the equivalent macro

The key point in the first section is the fourth bullet-point.

VI.F.1 Instruction reordering

Programs that always perform explicit checks and explicit throws, instead of relying on implicit run-time checks, are never visibly affected by relaxation, except for variations already permitted by the existing CLI standard (e.g., non-determinism of cross-thread non-volatile reads and writes). Furthermore, only control dependences induced by implicit run-time checks are relaxed. Nonetheless, data dependences must be respected.

Authors of strict methods can reason about their behavior without knowing details about whether their callers or callees are relaxed, because strict instructions act as a fence. On the other hand, we want calls from E-relaxed methods to E-relaxed methods to be inlinable “as if” they were inlined by hand at the source level. That is why an E-relaxed sequence is allowed to span between methods.

The second section is again important, it explicitly calls out the E-relaxed sequences are allowed to span across methods.

VI.F.2 Inlining

Inliners must be careful when dealing with a call to a method of different strictness. A call from a method to a more relaxed method can be inlined, conservatively, by treating the callee as strict as the caller; i.e., by ignoring any additional latitude granted the callee. Otherwise, if the strictness of the caller and callee differ, inlining the call requires either careful tracking of whether each check is relaxed or strict, or demoting the entire caller and inlined copy of the callee to a strictness that is at least as strict as the strictnesses of the caller and callee.

This block is slightly less important, but it covers the allowed optimizations between a relaxed and non-relaxed method.

VI.F.4 Interlaved calls

One potential hazard that users should look out for is that when a relaxed method calls another relaxed method, checks can appear to migrate from callee to caller and vice versa. Thus, methods that enforce program invariants that must be maintained in spite of faults should be marked as being strict for faults whose retiming may break the invariant.

For example, the method T.M below keeps x+y invariant.

[Example:
.class T {
.field public int32 x;
.field public int32 y;

    .method public void M() cil managed {
        .maxstack 2
        ldarg.0 // Compute x=x-1

        dup
        ldfld x
        ldc.i4.1
        sub
        stfld x

        ldarg.0 // Compute y=y+1

        dup
        ldfld y
        ldc.i4.1
        add
        stfld y
    }
    ...
}


>end example]

> If this method is relaxed, and the caller is also relaxed, then the caller would be allowed, in the
absence of constraining data or control dependences, to interleave the call with other instructions
in the caller. If one of those other interleaved instructions faults, then any or all of M’s side
effects might be suppressed. Thus, method M should be marked as strict if it is important to
prevent a fault from destroying the invariant.

> This “interference” from the caller is potentially annoying, but seems to be intrinsic to any
definition of relaxed exceptions that permits both:
> 1. instruction reordering and
> 2. inlined method calls are at least as fast as manual inlining.

This section is the most important. It describes that, for a relaxed method `M` and its caller, instructions from the caller are free to be interleaved with the instructions in `M` (provided there is no other constraining data or control dependences).

The second bullet point is also important in that it explicitly states that this is to allow both instruction reordering and to ensure that inlined method calls are at least as fast as manual inlining.

category:cq
theme:runtime
skill-level:expert
cost:medium
impact:small

FYI. @JosephTremoulet

So after re-reading the sections a few times. I'm fairly positive that this is supported under the pretext of E-relaxed methods/sequences/checks.

There are specific sections that not only call out that E-sequences can span across methods, but that these optimizations are specifically for things like allowing methods inlined by the JIT to be at least as fast as manually inlined methods.

@tannergooding, I think you're combining "side-effects" and "checks"/"exceptions" into a single concept in a way that the spec does not, and that this may be the cause of your incorrect reading of it. Side-effects and checks are separate throughout the language quoted above. Relaxation allows reordering some checks across each other and across side-effects. This means that, in terms of observable differences, when some check fails, some side-effects may be suppressed, and other (prior) checks may appear to be suppressed. There is nothing in here about suppressing checks, except in that they are implicitly suppressed when reordered after a subsequent check that fails.

Also, a "manual inlining" of your example:

int Select(bool condition, int trueValue, int falseValue)
{
    return condition ? trueValue : falseValue;
}

int MyMethod(bool condition, int[] a, int[] b)
{
    return Select(condition, a[5], b[6]);
}

would produce

int Select(bool condition, int trueValue, int falseValue)
{
    return condition ? trueValue : falseValue;
}

int MyMethod(bool condition, int[] a, int[] b)
{
    int _condition = condition;
    int _trueValue = a[5];
    int _falseValue = b[6];
    int _result = _condition ? _trueValue : _falseValue;
    return _result;
}

Further modifications that alter the semantics (by making the type safety checks conditional) are rewrites that produce semantically different programs, not "inlining".

Why there's discussion of interleaving instructions and being "as fast as" manual inlining is e.g. if the call were in a loop, you'd want to be able to hoist operations from the inlinee out of the loop even though doing so moves them past instructions from the caller that remain in the loop.

@CarolEidt could speak with more authority than I can as to the intent and details of the spec. The current JITs don't actually pay attention to the relaxed exception control bits, so this is somewhat academic without changing the JIT, and I don't know the history of why we have this complicated bit of spec that we don't actually take advantage of in codegen.

I think you're combining "side-effects" and "checks"/"exceptions" into a single concept in a way that the spec does not, and that this may be the cause of your incorrect reading of it.

@JosephTremoulet. I might be misunderstanding, but I don't think my proposal would be supressing the check, merely moving it so that it is only executed if the variable in question is actually accessed.

From VI.F.1

Authors of strict methods can reason about their behavior without knowing details about whether their callers or callees are relaxed, because strict instructions act as a fence. On the other hand, we want calls from E-relaxed methods to E-relaxed methods to be inlinable “as if” they were inlined by hand at the source level. That is why an E-relaxed sequence is allowed to span between methods.

It explicitly states that they want calls from callee to M to be inlinable "as if" they were inlined by hand at the source level.

I don't know anyone who would inline it by hand to be (as this would be a literal inlining of the raw IL):

int MyMethod(bool condition, int[] a, int[] b)
{
    bool _condition = condition;
    int _trueValue = a[5];
    int _falseValue = b[6];
    int _result = _condition ? _trueValue : _falseValue;
    return _result;
}

You would inline it by hand to be:

int MyMethod(bool condition, int[] a, int[] b)
{
    return condition ? a[5] : b[6];
}

A conforming implementation of the CLI is free to change the timing of relaxed E-checks in an E-relaxed sequence, with respect to other checks and instructions as long as the observable behavior of the program is changed only in the case that a relaxed E-check fails.

My understanding of this, is that the implicit check from an instruction can be reordered anywhere, provided that the observable behavior is only changed in the case that the check fails.

So, unless a range check has an observable side effect in the case it succeeds, we should be free to move the check to execute at a later point in the program.

That is, if condition is false and a[6] would fail (either because a is null or because 5 is out of range), we do not have to fail the execution of the method since a is not actually accessed in this code-path. (Provided the appropriate relaxations are in place).

Aha. You're interpreting "reorder" and "suppress" in terms of the static view of how the program is rewritten. I believe the spec is discussing the operational semantics, and that "reorder" and "suppress" are meant in terms of the dynamic stream of visible events that the executing program might trigger. So "suppress" means "remove from the dynamic stream", not "remove from the static view of the program", and "may be reordered" means "may occur in a different order in the dynamic event stream", not "may have one or the other suppressed from the dynamic stream so long as it is still present somewhere in the static view of the program".

So today, we have the following streams

MyMethod:
    Load condition
    Null Check A
    Range Check A
    Load a[5]
    Null Check B
    Range Check B
    Load b[5]
    Call Select
    Return

Select:
    Load Condition
    Branch on False to F
  T:
    Load trueValue
    Return
  F:
    Load falseVaalue
    Return

Today, under the normal requirements, it is transformed to:

MyMethod:
    Null Check A
    Range Check A
    Null Check B
    Range Check B
    Load Condition
    Branch on False to F
  T:
    Load A[5]
    Return
  F:
    Load B[6]
    Return

My proposal is that, following the relaxations we transform it to:

MyMethod:
    Load Condition
    Branch on False to F
  T:
    Null Check A
    Range Check A
    Load A[5]
    Return
  F:
    Null Check B
    Range Check B
    Load B[6]
    Return

So we are not "suppressing" anything (removing it from the dynamic stream". We are only "reordering" (may occur in a different order).

Because the Null Check and the Range Check do not have a side-effect in the case they succeed. This is allowed.

I do not think the reordered sequence must guarantee that the exception is still thrown in all possible code paths. If that was the case, then re-ordering the check elsewhere is effectively pointless.

Users (especially users who care about performance) would much rather the method fail earlier rather than later.

As such, I believe that this (in addition to all the mentions of the generated code being as optimal as hand-inlining the method in the original source) means my view is correct. The check can be reordered to be later in the stream and it is not required to be executed in all subsequent code-paths. Merely that, the check must still exist and must still provide the appropriate protection to the relevant operation (in this case, the null and range check must still exist before accessing an element in the array, but it does not have to exist in a path where you do not access said array).

Talking past each other... maybe I should have said "trace" instead of "stream"? The reasoning goes something like this:

Start with the unoptimized program:

int Select(bool condition, int trueValue, int falseValue)
{
    return condition ? trueValue : falseValue;
}

int MyMethod(bool condition, int[] a, int[] b)
{
    return Select(condition, a[5], b[6]);
}

For each possible set of inputs, consider:

2.1. What dynamic trace of events does the unoptimized program execute? 2.2. What dynamic trace of events does the optimized program execute?
The optimization is legal only if, for all possible sets of inputs, the traces from 2.1 and 2.2 are the same up to whatever latitude the spec dictates.

In this case, if the input has condition true and a some array of length greater than 5 and b is some array of length greater than 6, then:

the unoptimized trace is (load local arguments); null check and bounds check on "a" that pass; load a[5] from memory; null check and bounds check on "b" that pass; load b[6] from memory; test "condition"; return the value that was loaded from a[5]
the optimized trace is (load local arguments); test "condition"; null check and bounds check on "a" that pass; load a[5] from memory; return the value that was loaded from a[5]

and these agree in terms of visible side-effects and exceptions, so hooray.

But if the input has condition true and a some array of length greater than 5 and b null, then:

the unoptimized trace is (load local arguments); null check and bounds check on "a" that pass; load a[5] from memory; null check on "b" that fails; raise NullReferenceException
the optimized trace is (load local arguments); test "condition"; null check and bounds check on "a" that pass; load a[5] from memory; return the value that was loaded from a[5]

and the fact that raise NullReferenceException was in the first trace but not the second is what it means to "suppress" that failing check.

@JosephTremoulet, I still don't think that is quite right. That leaves this entire portion of the spec effectively useless.

If a check is going to fail (that is cause an exception) then moving it to later in the program is effectively useless (it means additional code will execute, but overall the function will still fail, will delay program execution, etc).

Based on all the surrounding context, I am almost certain that this implies that the check is not required to exist in every subsequent code path, but instead that it is only required to exist in the code paths where it provides the required validation.

That leaves this entire portion of the spec effectively useless.

Not so. Consider this example:

class SomeClass {
  int someField;
  static int addToEach(int[] a, SomeClass o) {
    int sum = 0;
    for (int i = start; i != stop; ++i) {
      a[i] = a[i] + o.someField;
    }
  }
}

That loop has a null check on a, then a bounds check on a[i], then a field load which implicitly does a null check on o. But o.someField doesn't change in the loop, and loads are expensive, so you'd like the compiler to be able to load o.someField before it runs the loop. Doing that would change the observed exception raised from IndexOutOfBounds to NullReference in the event that a is non-null but start is greater or equal to a.Length. With relaxed exceptions, we can hoist it anyway -- the checks get reordered, but in a sense that these sections carefully lay out to be legal.

Loop rewriting is explicitly called out in a separate portion, with such an example.

All the sections about method inlining under relaxations are explicitly separate, explicitly call out optimizing in a way equivalent to had you inclined the code in the original source (not in IL), as to being allowed to optimize as if the called method were a C/C++ style macro, etc. They all indicate that my assumptions are correct and the check can be moved such that it only exists under the path where it would be required (for safety/verifiability).

The spec says The check’s exception is thrown some time in the sequence, unless the sequence throws another exception.. You're asking for an optimization to ignore that on based not on any normative text but on the basis that the rationale text in another section uses a phrase that you interpret to include this particular semantics-altering rewrite you have in mind. We'll have to agree to disagree.

Yes, but the entirely depends on how you interpret sequence (whether it is a single code path, or all code up until the next non-trivial protected or handler region).

There is at least one example in the spec (on phone, will link in a bit) where it defines a sequence to contain a branch.

@CarolEidt, could you comment as to whether my understanding of the spec is correct?

If it isn't, that is fine. I am just wanting to know whether I should update the proposal.

If my understanding is correct and if the attributes are there for use; then even if the JIT doesn't use them today, other tools could (CoreRT could use them for better codegen as well, for example).

If it isn't supported then I would want to update the proposal to indicate that such a thing should be possible (I should be able to opt-in towards having my code inlined, as if I had inlined it by hand, regardless of the normal runtime rules).

I want to chime in a bit on the spec and the value of having access to E-relaxed sequences.

For the purpose of this discussion, let's suppose that the C# compiler transforms the code "canonically", e.g., it does not convert int a = new int[]{1}[0] into int a = 0 and will indeed emit newarr and ldelem(a) etc. This is important because E-relaxation is about the effect of executing CIL, not the effect of translating C# to CIL.

OP's Example

The OP's initial example:

int IIf(bool condition, int @true, int @false) { return condition ? @true : @false; }

var items = new int[] { 1 };
int result = IIf(items.Length > 1 ? items[1] : items[0]);

If we manually inline the emitted CIL (so method call using CIL is converted to substitution of method CIL body):

var items = new int[] { 1 };
// C# evaluates arguments from left to right.
bool condition = (items.Length > 1);
int @true = items[1];
int @false = items[0];
int result = (condition ? @true : @false);

In this example, it does not matter whether the sequence is ArrayExceptions-relaxed. The result is always an IndexOutOfRangeException being thrown.

The spec says:

A conforming implementation of the CLI is free to change the timing of relaxed E-checks in an E-relaxed sequence, with respect to other checks and instructions as long as the observable behavior of the program is changed only in the case that a relaxed E-check fails. If an E-check fails in an E-relaxed sequence:

The rest of the associated instruction must be suppressed, in order to preserve verifiability. If the instruction was expected to push a value on the VES stack, no subsequent instruction that uses that value should visibly execute.

It is unspecified whether or not any or all of the side effects in the E-relaxed sequence are made visible by the VES.

The check’s exception is thrown some time in the sequence, unless the sequence throws another exception. When multiple relaxed checks fail, it is unspecified as to which exception is thrown by the VES.

In OP's example, the sequence of CIL causes array bounds check to fail at items[1] and it does not cause any other failed checks nor exceptions. Therefore, the spec mandates that IndexOutOfRangeException be thrown.

Teaser Example

The rest of this comment is based on my understanding of the spec. Let me first demonstrate one interesting possible consequence of E-relaxation, before going to a more practically relevant example.

class S { public object field = null; }

try
{
  var o = new object();
  var a = new int[0];
  // store 1
  S.field = 1;
  var foo = (string)o;
  var bar = a[1];
  // store 2
  S.field = 2.0;
}
catch (Exception ex)
{
  Console.WriteLine(S.field is int
    ? "field is int"
    : S.field is double
    ? "field is double"
    : field == null
    ? "field == null"
    : "other");
  Console.WriteLine(ex.GetType());
}

Suppose the sequence is InvalidCastException- and ArrayExceptions-relaxed, then the VES is free to choose from the following behaviors:

field is int + InvalidCastException (= store 1 then check cast)
field is int + IndexOutOfRangeException (= store 1 then check index)
field == null + InvalidCastException (= check cast before trying store 1)
field == null + IndexOutOfRangeException (= check index before trying store 1)

The check’s exception is thrown some time in the sequence, unless the sequence throws another exception. When multiple relaxed checks fail, it is unspecified as to which exception is thrown by the VES.

This means that either InvalidCastException or IndexOutOfRangeException can be thrown, and one of them must be thrown.

It is unspecified whether or not any or all of the side effects in the E-relaxed sequence are made visible by the VES.

In cases 3 and 4, the side effect of setting S.field is not made visible.

However, it is not possible that field is double, because

The rest of the associated instruction must be suppressed

Optimization Example

The primary usage of E-relaxation is to hoist checks.

void AccumulateFirst10(ref int sum, int howmany, params int[] summands)
{
  for (int i = 0; i < howmany; ++i)
  {
    sum += summands[i];
  }
}

int result = 0;
try
{
  AccumulateFirst10(ref result, 100, 1, 2, 3);
}
catch (Exception ex)
{
  Console.WriteLine(result);
  Console.WriteLine(ex.GetType());
}

If AccumulateFirst10 is ArrayExceptions-strict, then the effect of the call is to set result to 6 and throw IndexOutOfRangeException. The compiler is highly likely to simply do the array bounds check in every iteration.

However, if the method is ArrayExceptions-relaxed, then it's possible that result is 0 --- the native code can be morally equivalent to this:

void AccumulateFirst10(ref int sum, int howmany, params int[] summands)
{
  if (howmany < 0) return;
  // hoisted check
  if (howmany > summands.Length) throw new IndexOutOfRangeException();
  for (int i = 0; i < howmany; ++i)
  {
    // summands[i] is loaded WITHOUT bound check
    sum += summands[i];
  }
}

The reason is as follows. If howmany is greater than summands.Length, then the loop will eventually lead to IndexOutOfRangeException, in which case the side effect of editing sum need not be made visible due to ArrayExceptions-relaxation, therefore, it is permitted to do this check beforehand and fail early. Of course, if the check passes, then the subsequent array accesses do not need bounds check. This could be some significant gain.

Also, it is necessary to check for howmany < 0 first, because in that case, the strict execution of CIL completes without an exception --- if we checked howmany > summands.Length first, it could lead to NullReferenceException out of nowhere (of course a CLR implementation could choose to trap the access violate and not turn it into an exception but return from the function, but that is so crazy that no implementer would do it in real life).

Rationale in the spec

On the other hand, we want calls from E-relaxed methods to E-relaxed methods to be inlinable “as if” they were inlined by hand at the source level. That is why an E-relaxed sequence is allowed to span between methods.

I think this rationale is based on optimization opportunities.

Consider a series of nested calls to strict methods, the timing of exceptions and side effects are strict, and it's possible that each method's native code contains a lot of checks and throws created by JIT. This makes inlining unprofitable.
Now consider a series of nested calls to E-relaxed methods, the VES can run all the checks upon entrance to the sequence and avoid all checks during the entirety of execution.

For example, consider

struct S { public int field; }
void For1d(ref object result, int count, object[] array)
{
  for (int i = 0; i < count; ++i) result = (S)array[i];
}
void For2d(ref object result, int count, object[][] jagged)
{
  for (int i = 0; i < count; ++i) For1d(ref result, count, jagged[i]);
}

Call to For1d could fail for these reasons: array == null (null reference), array.Length < count (index out of range), !(array[i] is S) for some 0 <= i < count (invalid cast).

If it is relaxed, these checks can be performed, then a single unboxing as S and boxing into result can be done.
If it is not relaxed, then the unboxing and boxing must be attempted in order, because upon a failed check, result is assigned to the last successful conversion. (It's permitted to not box the intermediate copies, just the last, but this'd be a fairly complicated optimization and it's unlikely.)

Similary, For2d could fail for these reasons: jagged == null, jagged.Length < count, a nested call to For1d fails.

If relaxation is per method and not "inter-method", then in case For2d fails during a call to For1d, it should set result to either the effect of the last successful For1d, or an indeterminate effect causable by the failing For1d. This makes it hard to produce performant code.

If relaxation is inter-method, then upon entrance to For2d, the VES can perform all the checks, and if all of them pass, then unbox jagged[count][count] as S and box into result. There is no count or count * count number of boxings!

Summary

Having access to E-relaxation could enable optimizing opportunities interesting for library authors or people needing performance-critical code, by sacrificing exception guarantees.

dotnet / runtime

A mechanism to relax exception checks such that inlined method calls are at least as fast as manual inlining should be supported. #8948