Open tannergooding opened 7 years ago
FYI. @JosephTremoulet
So after re-reading the sections a few times. I'm fairly positive that this is supported under the pretext of E-relaxed
methods/sequences/checks.
There are specific sections that not only call out that E-sequences
can span across methods, but that these optimizations are specifically for things like allowing methods inlined by the JIT to be at least as fast as manually inlined methods.
@tannergooding, I think you're combining "side-effects" and "checks"/"exceptions" into a single concept in a way that the spec does not, and that this may be the cause of your incorrect reading of it. Side-effects and checks are separate throughout the language quoted above. Relaxation allows reordering some checks across each other and across side-effects. This means that, in terms of observable differences, when some check fails, some side-effects may be suppressed, and other (prior) checks may appear to be suppressed. There is nothing in here about suppressing checks, except in that they are implicitly suppressed when reordered after a subsequent check that fails.
Also, a "manual inlining" of your example:
int Select(bool condition, int trueValue, int falseValue)
{
return condition ? trueValue : falseValue;
}
int MyMethod(bool condition, int[] a, int[] b)
{
return Select(condition, a[5], b[6]);
}
would produce
int Select(bool condition, int trueValue, int falseValue)
{
return condition ? trueValue : falseValue;
}
int MyMethod(bool condition, int[] a, int[] b)
{
int _condition = condition;
int _trueValue = a[5];
int _falseValue = b[6];
int _result = _condition ? _trueValue : _falseValue;
return _result;
}
Further modifications that alter the semantics (by making the type safety checks conditional) are rewrites that produce semantically different programs, not "inlining".
Why there's discussion of interleaving instructions and being "as fast as" manual inlining is e.g. if the call were in a loop, you'd want to be able to hoist operations from the inlinee out of the loop even though doing so moves them past instructions from the caller that remain in the loop.
@CarolEidt could speak with more authority than I can as to the intent and details of the spec. The current JITs don't actually pay attention to the relaxed exception control bits, so this is somewhat academic without changing the JIT, and I don't know the history of why we have this complicated bit of spec that we don't actually take advantage of in codegen.
I think you're combining "side-effects" and "checks"/"exceptions" into a single concept in a way that the spec does not, and that this may be the cause of your incorrect reading of it.
@JosephTremoulet. I might be misunderstanding, but I don't think my proposal would be supressing the check, merely moving it so that it is only executed if the variable in question is actually accessed.
From VI.F.1
Authors of strict methods can reason about their behavior without knowing details about whether their callers or callees are relaxed, because strict instructions act as a fence. On the other hand, we want calls from E-relaxed methods to E-relaxed methods to be inlinable “as if” they were inlined by hand at the source level. That is why an E-relaxed sequence is allowed to span between methods.
It explicitly states that they want calls from callee to M
to be inlinable "as if" they were inlined by hand at the source level.
I don't know anyone who would inline it by hand to be (as this would be a literal inlining of the raw IL):
int MyMethod(bool condition, int[] a, int[] b)
{
bool _condition = condition;
int _trueValue = a[5];
int _falseValue = b[6];
int _result = _condition ? _trueValue : _falseValue;
return _result;
}
You would inline it by hand to be:
int MyMethod(bool condition, int[] a, int[] b)
{
return condition ? a[5] : b[6];
}
A conforming implementation of the CLI is free to change the timing of relaxed E-checks in an E-relaxed sequence, with respect to other checks and instructions as long as the observable behavior of the program is changed only in the case that a relaxed E-check fails.
My understanding of this, is that the implicit check from an instruction can be reordered anywhere, provided that the observable behavior is only changed in the case that the check fails.
So, unless a range check has an observable side effect in the case it succeeds, we should be free to move the check to execute at a later point in the program.
That is, if condition
is false and a[6]
would fail (either because a
is null
or because 5 is out of range), we do not have to fail the execution of the method since a
is not actually accessed in this code-path. (Provided the appropriate relaxations are in place).
Aha. You're interpreting "reorder" and "suppress" in terms of the static view of how the program is rewritten. I believe the spec is discussing the operational semantics, and that "reorder" and "suppress" are meant in terms of the dynamic stream of visible events that the executing program might trigger. So "suppress" means "remove from the dynamic stream", not "remove from the static view of the program", and "may be reordered" means "may occur in a different order in the dynamic event stream", not "may have one or the other suppressed from the dynamic stream so long as it is still present somewhere in the static view of the program".
So today, we have the following streams
MyMethod:
Load condition
Null Check A
Range Check A
Load a[5]
Null Check B
Range Check B
Load b[5]
Call Select
Return
Select:
Load Condition
Branch on False to F
T:
Load trueValue
Return
F:
Load falseVaalue
Return
Today, under the normal requirements, it is transformed to:
MyMethod:
Null Check A
Range Check A
Null Check B
Range Check B
Load Condition
Branch on False to F
T:
Load A[5]
Return
F:
Load B[6]
Return
My proposal is that, following the relaxations we transform it to:
MyMethod:
Load Condition
Branch on False to F
T:
Null Check A
Range Check A
Load A[5]
Return
F:
Null Check B
Range Check B
Load B[6]
Return
So we are not "suppressing" anything (removing it from the dynamic stream". We are only "reordering" (may occur in a different order).
Because the Null Check and the Range Check do not have a side-effect in the case they succeed. This is allowed.
I do not think the reordered sequence must guarantee that the exception is still thrown in all possible code paths. If that was the case, then re-ordering the check elsewhere is effectively pointless.
Users (especially users who care about performance) would much rather the method fail earlier rather than later.
As such, I believe that this (in addition to all the mentions of the generated code being as optimal as hand-inlining the method in the original source) means my view is correct. The check can be reordered to be later in the stream and it is not required to be executed in all subsequent code-paths. Merely that, the check must still exist and must still provide the appropriate protection to the relevant operation (in this case, the null and range check must still exist before accessing an element in the array, but it does not have to exist in a path where you do not access said array).
Talking past each other... maybe I should have said "trace" instead of "stream"? The reasoning goes something like this:
int Select(bool condition, int trueValue, int falseValue)
{
return condition ? trueValue : falseValue;
}
int MyMethod(bool condition, int[] a, int[] b)
{
return Select(condition, a[5], b[6]);
}
For each possible set of inputs, consider:
2.1. What dynamic trace of events does the unoptimized program execute? 2.2. What dynamic trace of events does the optimized program execute?
The optimization is legal only if, for all possible sets of inputs, the traces from 2.1 and 2.2 are the same up to whatever latitude the spec dictates.
In this case, if the input has condition
true and a
some array of length greater than 5 and b
is some array of length greater than 6, then:
(load local arguments); null check and bounds check on "a" that pass; load a[5] from memory; null check and bounds check on "b" that pass; load b[6] from memory; test "condition"; return the value that was loaded from a[5]
(load local arguments); test "condition"; null check and bounds check on "a" that pass; load a[5] from memory; return the value that was loaded from a[5]
and these agree in terms of visible side-effects and exceptions, so hooray.
But if the input has condition
true and a
some array of length greater than 5 and b
null, then:
(load local arguments); null check and bounds check on "a" that pass; load a[5] from memory; null check on "b" that fails; raise NullReferenceException
(load local arguments); test "condition"; null check and bounds check on "a" that pass; load a[5] from memory; return the value that was loaded from a[5]
and the fact that raise NullReferenceException
was in the first trace but not the second is what it means to "suppress" that failing check.
@JosephTremoulet, I still don't think that is quite right. That leaves this entire portion of the spec effectively useless.
If a check is going to fail (that is cause an exception) then moving it to later in the program is effectively useless (it means additional code will execute, but overall the function will still fail, will delay program execution, etc).
Based on all the surrounding context, I am almost certain that this implies that the check is not required to exist in every subsequent code path, but instead that it is only required to exist in the code paths where it provides the required validation.
That leaves this entire portion of the spec effectively useless.
Not so. Consider this example:
class SomeClass {
int someField;
static int addToEach(int[] a, SomeClass o) {
int sum = 0;
for (int i = start; i != stop; ++i) {
a[i] = a[i] + o.someField;
}
}
}
That loop has a null check on a
, then a bounds check on a[i]
, then a field load which implicitly does a null check on o
. But o.someField
doesn't change in the loop, and loads are expensive, so you'd like the compiler to be able to load o.someField
before it runs the loop. Doing that would change the observed exception raised from IndexOutOfBounds
to NullReference
in the event that a
is non-null but start
is greater or equal to a.Length
. With relaxed exceptions, we can hoist it anyway -- the checks get reordered, but in a sense that these sections carefully lay out to be legal.
Loop rewriting is explicitly called out in a separate portion, with such an example.
All the sections about method inlining under relaxations are explicitly separate, explicitly call out optimizing in a way equivalent to had you inclined the code in the original source (not in IL), as to being allowed to optimize as if the called method were a C/C++ style macro, etc. They all indicate that my assumptions are correct and the check can be moved such that it only exists under the path where it would be required (for safety/verifiability).
The spec says The check’s exception is thrown some time in the sequence, unless the sequence throws another exception.
. You're asking for an optimization to ignore that on based not on any normative text but on the basis that the rationale text in another section uses a phrase that you interpret to include this particular semantics-altering rewrite you have in mind. We'll have to agree to disagree.
Yes, but the entirely depends on how you interpret sequence (whether it is a single code path, or all code up until the next non-trivial protected or handler region).
There is at least one example in the spec (on phone, will link in a bit) where it defines a sequence to contain a branch.
@CarolEidt, could you comment as to whether my understanding of the spec is correct?
If it isn't, that is fine. I am just wanting to know whether I should update the proposal.
If my understanding is correct and if the attributes are there for use; then even if the JIT doesn't use them today, other tools could (CoreRT could use them for better codegen as well, for example).
If it isn't supported then I would want to update the proposal to indicate that such a thing should be possible (I should be able to opt-in towards having my code inlined, as if I had inlined it by hand, regardless of the normal runtime rules).
I want to chime in a bit on the spec and the value of having access to E-relaxed sequences.
For the purpose of this discussion, let's suppose that the C# compiler transforms the code "canonically", e.g., it does not convert int a = new int[]{1}[0]
into int a = 0
and will indeed emit newarr
and ldelem(a)
etc. This is important because E-relaxation is about the effect of executing CIL, not the effect of translating C# to CIL.
The OP's initial example:
int IIf(bool condition, int @true, int @false) { return condition ? @true : @false; }
var items = new int[] { 1 };
int result = IIf(items.Length > 1 ? items[1] : items[0]);
If we manually inline the emitted CIL (so method call using CIL is converted to substitution of method CIL body):
var items = new int[] { 1 };
// C# evaluates arguments from left to right.
bool condition = (items.Length > 1);
int @true = items[1];
int @false = items[0];
int result = (condition ? @true : @false);
In this example, it does not matter whether the sequence is ArrayExceptions-relaxed. The result is always an IndexOutOfRangeException
being thrown.
The spec says:
A conforming implementation of the CLI is free to change the timing of relaxed E-checks in an E-relaxed sequence, with respect to other checks and instructions as long as the observable behavior of the program is changed only in the case that a relaxed E-check fails. If an E-check fails in an E-relaxed sequence:
- The rest of the associated instruction must be suppressed, in order to preserve verifiability. If the instruction was expected to push a value on the VES stack, no subsequent instruction that uses that value should visibly execute.
- It is unspecified whether or not any or all of the side effects in the E-relaxed sequence are made visible by the VES.
- The check’s exception is thrown some time in the sequence, unless the sequence throws another exception. When multiple relaxed checks fail, it is unspecified as to which exception is thrown by the VES.
In OP's example, the sequence of CIL causes array bounds check to fail at items[1]
and it does not cause any other failed checks nor exceptions. Therefore, the spec mandates that IndexOutOfRangeException
be thrown.
The rest of this comment is based on my understanding of the spec. Let me first demonstrate one interesting possible consequence of E-relaxation, before going to a more practically relevant example.
class S { public object field = null; }
try
{
var o = new object();
var a = new int[0];
// store 1
S.field = 1;
var foo = (string)o;
var bar = a[1];
// store 2
S.field = 2.0;
}
catch (Exception ex)
{
Console.WriteLine(S.field is int
? "field is int"
: S.field is double
? "field is double"
: field == null
? "field == null"
: "other");
Console.WriteLine(ex.GetType());
}
Suppose the sequence is InvalidCastException- and ArrayExceptions-relaxed, then the VES is free to choose from the following behaviors:
field is int
+ InvalidCastException
(= store 1 then check cast)field is int
+ IndexOutOfRangeException
(= store 1 then check index)field == null
+ InvalidCastException
(= check cast before trying store 1)field == null
+ IndexOutOfRangeException
(= check index before trying store 1)The check’s exception is thrown some time in the sequence, unless the sequence throws another exception. When multiple relaxed checks fail, it is unspecified as to which exception is thrown by the VES.
This means that either InvalidCastException
or IndexOutOfRangeException
can be thrown, and one of them must be thrown.
It is unspecified whether or not any or all of the side effects in the E-relaxed sequence are made visible by the VES.
In cases 3 and 4, the side effect of setting S.field
is not made visible.
However, it is not possible that field is double
, because
The rest of the associated instruction must be suppressed
The primary usage of E-relaxation is to hoist checks.
void AccumulateFirst10(ref int sum, int howmany, params int[] summands)
{
for (int i = 0; i < howmany; ++i)
{
sum += summands[i];
}
}
int result = 0;
try
{
AccumulateFirst10(ref result, 100, 1, 2, 3);
}
catch (Exception ex)
{
Console.WriteLine(result);
Console.WriteLine(ex.GetType());
}
If AccumulateFirst10
is ArrayExceptions-strict, then the effect of the call is to set result
to 6
and throw IndexOutOfRangeException
. The compiler is highly likely to simply do the array bounds check in every iteration.
However, if the method is ArrayExceptions-relaxed, then it's possible that result
is 0
--- the native code can be morally equivalent to this:
void AccumulateFirst10(ref int sum, int howmany, params int[] summands)
{
if (howmany < 0) return;
// hoisted check
if (howmany > summands.Length) throw new IndexOutOfRangeException();
for (int i = 0; i < howmany; ++i)
{
// summands[i] is loaded WITHOUT bound check
sum += summands[i];
}
}
The reason is as follows. If howmany
is greater than summands.Length
, then the loop will eventually lead to IndexOutOfRangeException
, in which case the side effect of editing sum
need not be made visible due to ArrayExceptions-relaxation, therefore, it is permitted to do this check beforehand and fail early. Of course, if the check passes, then the subsequent array accesses do not need bounds check. This could be some significant gain.
Also, it is necessary to check for howmany < 0
first, because in that case, the strict execution of CIL completes without an exception --- if we checked howmany > summands.Length
first, it could lead to NullReferenceException
out of nowhere (of course a CLR implementation could choose to trap the access violate and not turn it into an exception but return from the function, but that is so crazy that no implementer would do it in real life).
On the other hand, we want calls from E-relaxed methods to E-relaxed methods to be inlinable “as if” they were inlined by hand at the source level. That is why an E-relaxed sequence is allowed to span between methods.
I think this rationale is based on optimization opportunities.
For example, consider
struct S { public int field; }
void For1d(ref object result, int count, object[] array)
{
for (int i = 0; i < count; ++i) result = (S)array[i];
}
void For2d(ref object result, int count, object[][] jagged)
{
for (int i = 0; i < count; ++i) For1d(ref result, count, jagged[i]);
}
Call to For1d
could fail for these reasons: array == null
(null reference), array.Length < count
(index out of range), !(array[i] is S)
for some 0 <= i < count
(invalid cast).
S
and boxing into result
can be done.result
is assigned to the last successful conversion. (It's permitted to not box the intermediate copies, just the last, but this'd be a fairly complicated optimization and it's unlikely.)Similary, For2d
could fail for these reasons: jagged == null
, jagged.Length < count
, a nested call to For1d
fails.
If relaxation is per method and not "inter-method", then in case For2d
fails during a call to For1d
, it should set result
to either the effect of the last successful For1d
, or an indeterminate effect causable by the failing For1d
. This makes it hard to produce performant code.
If relaxation is inter-method, then upon entrance to For2d
, the VES can perform all the checks, and if all of them pass, then unbox jagged[count][count]
as S
and box into result
. There is no count
or count * count
number of boxings!
Having access to E-relaxation could enable optimizing opportunities interesting for library authors or people needing performance-critical code, by sacrificing exception guarantees.
A mechanism to relax exception checks such that inlined method calls are at least as fast as manual inlining should be supported.
Rationale
Under the CIL specification, the runtime is normally only allowed to perform certain optimizations and must preserve side-effects and exceptions generated by a thread in the order that they appear (I.12.6.4).
This means that, when inlining methods, it is not as effecicient as manual inlining when you are passing an argument that may potentially have side effects (such as accessing the element of an array). This also means that instruction reordering may not be allowed.
However, it is often desirable that the JIT produce code that is more performant over code that is more "compliant"
As such, I propose a mechanism be exposed to relax various existing compilation requirements such that the desired code gen can be achieved.
Example
Under the normal restrictions, the following code:
cannot be transformed to be equivalent to:
This means that, even if the
Select
call is inlined, it will fail ifcondition
is true andb[6]
is invalid (eitherb
isnull
or 6 is out of range) or whencondition
is false anda[5]
is invalid. However, the manually inlined code does not have this problem.Proposal
I believe that this functionality is allowed by the existing runtime specification under the pretext of
E-relaxed
methods (see below).The specification gives examples, and even declares members in the BCL, that should allow this functionality today. However, we do not expose the declared members and therefore do not support the functionality.
As such, I propose we expose the missing
System.Runtime.CompilerServices.CompilationRelaxations
values declared in ECMA TR-84 and update the runtime to support them.This will be beneficial both to JIT'd code as well as to AOT'd code that attempts to remain "compliant"/"compatible".
The proposed members are:
Important Spec Sections
I.12.6.4 Optimization
The first part explains the normal limitations and basically says (to my understanding) that instructions with side-effects (including throwing exceptions) may not be reordered with regard to each other.
The second part is a bit more in-depth and even gives examples in Annex F (see below). The important bit is probably the section on
E-checks
and how the timing is allowed to be changed.VI.Annex F Imprecise faults
The key point in the first section is the fourth bullet-point.
VI.F.1 Instruction reordering
The second section is again important, it explicitly calls out the
E-relaxed
sequences are allowed to span across methods.VI.F.2 Inlining
This block is slightly less important, but it covers the allowed optimizations between a relaxed and non-relaxed method.
VI.F.4 Interlaved calls