eclipse / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
944 stars 396 forks source link

Develop guidelines for adding/using IL opcodes and NonHelper symbols in the OMR compiler #3051

Open Leonardo2718 opened 5 years ago

Leonardo2718 commented 5 years ago

Issues #3049 and #2997 have raised the following question: how do we decide whether to use IL opcodes or NonHelper (Intrinsic; see #3050) symbols to represent certain functionality/semantics in the OMR compiler?

I will attempt to summarize the discussion in the comments (my sincerest apologies if this is incomplete or somehow changes the intended meaning of the original author)

@vijaysun-omr @mstoodle @charliegracie @andrewcraik @jdmpapin @ymanton @0xdaryl @mpirvu @gita-omr @xliang6

Leonardo2718 commented 5 years ago

@NigelYiboYu points out that using IL opocdes ({i,l}popcnt) means:

  • more effort is required to maintain and support them. e..g. more IL testing
  • (probably slightly) more optimization oppourtunities because we can constant-fold bit counts and open up opportunities for other optimizations to happen. (Although, one can argue that simple operations like these wont' make too much performance difference)
  • potential benefit to other languages/runtimes that employ OMR. e.g. python's int.bit_length()
Leonardo2718 commented 5 years ago

@fjeremic responded with:

Similarly to #2997 I'm in favor of deprecating this IL. It has no testing, it's not documented or well defined and the IL flows straight down to the codegen where it is not implemented on all platforms. It breaks pretty much every rule in the real of "should this operation be an IL opcode".

  • (probably slightly) more optimization oppourtunities because we can constant-fold bit counts and open up opportunities for other optimizations to happen. (Although, one can argue that simple operations like these wont' make too much performance difference)

IMO the constant folding point can be said about so many operations. This doesn't seem like enough of an incentive to define a whole IL for a trivial simplification. Consider a similar example of a language that had a method numberOfTrailingZeros(int value) which returns the number trailing 0 bits in value. Should we define an IL for this operation? Surely if value was a constant we can fold away the entire call!

How many such APIs fit this criteria? Why don't we define them all as IL then? Can you see the problem with this approach?

  • potential benefit to other languages/runtimes that employ OMR. e.g. python's int.bit_length()

Defining population count as a codegen non-helper symbol also achieves this. See #2958 as an example where multiple languages can use atomic codegen symbols without them being an IL. I see no benefit of the IL when the codegen symbols serve the same purpose.

Leonardo2718 commented 5 years ago

@0dvictor noted that:

IMO, popcnt has some fundamental differences from the atomic operations; therefore, arguments from #2997 may not apply. Behavior of popcnt is same as other unary arithmetic/log operations such as neg and not. They are side-effect free, so that may be re-ordered or removed by Optimizer. Therefore popcnt has quite different constraint from call, while atomic operations share a lot similarities with call.

I totally agree @fjeremic 's point about constant folding itself doesn't make it qualify for an IL Op; however, Optimizer can potentially do much more. Similar to other unary op like neg, not, sqrt, etc. Optimizer can in theory do at least following transformations:

  1. Change the evaluation order;
  2. Constant folding;
  3. VP: 3.1 If the child has no constraint, popcnt has a constraint [0, 31] or [0, 63]; 3.2 If the child is not zero, the constraint of popcnt is [1, 31] or [1, 63], and popcnt is also non-zero; 3.3 If the child has constraint [a, b] and a >= 0, popcnt has a constraint [1, popcnt(next_power_of_2(b)-1)]; 3.4 There may be other rules that I cannot think of at the moment;
    1. popcnt(not(a)) <==> 31or63 - popcnt(a);
    2. popcnt(a << n) <==> popcnt(a) when a<<n does not over flow;
    3. popcnt(rotate(a,n)) <==> popcnt(a);
    4. if (popcnt(a) == 0) <==> if (a==0);
    5. Other patterns that may be transformed.

These are pretty much what Optimizer can do for other arithmetic/logic operations; therefore I think popcnt deserves its own IL opcode. So do trailing/leading-zero count. OMR also have IL op for them as well.

I disagree with the argument that it should not be an IL Op because of lack of documentation or test. Unfortunately most of if not all of OMR's IL Op are not currently well-documented. There is an on-going effort (#2569) to complete the documentation and tests. If an operation qualifies for being an IL op, documentation and tests can always be added. On the other hand, being a NonHelper should not mean looser requirement on documentation or tests.

In short, I prefer to keep popcnt as an IL Op.

Leonardo2718 commented 5 years ago

@fjeremic responded:

Though these are compelling arguments I would vouch to say they can apply to so many operations I can think of and as such would also be candidates for IL by this definition. If we go down this road our set of IL would grow to unmaintainable proportions I'm afraid.

As an example of a framework similar to OMR let's take a look at LLVM [1]. It defines LLVM instructions which are much like our IL opcodes in OMR, and LLVM intrinsic functions which are call like operations much like our codegen non-helper symbols. Taking a closer look, LLVM defines population count as an intrinsic function [2]. According to official documentation and guidelines intrinsic functions are preferred for operations which can be expressed as a function call and are transparent to optimization passes, similarly to our codegen non-helper symbols [3]:

Adding an intrinsic function is far easier than adding an instruction, and is transparent to optimization passes. If your added functionality can be expressed as a function call, an intrinsic function is the method of choice for LLVM extension.

In fact if you take a look at the full set of LLVM intrinsic functions you'll find that effectively all of them fit the 8 points you listed above. There are over 50 of them. With the number of datatypes OMR supports that's well over 400 new IL opcodes which can be added. [Once typed vector IL gets added its in the thousands really!] Do you think that's a bit too much? I certainly do.

[1] http://llvm.org/docs/LangRef.html [2] http://llvm.org/docs/LangRef.html#llvm-ctpop-intrinsic [3] https://llvm.org/docs/ExtendingLLVM.html

Leonardo2718 commented 5 years ago

@fjeremic suggests:

Given the above discussion it seems to me that we need to come up with some sort of a guideline, inspired by LLVM, on when some operation should be made into an IL. IMO the sooner we define this the better otherwise we are going to have a hard time adding or removing IL from OMR.

Leonardo2718 commented 5 years ago

@0dvictor asked:

I second the idea to set up a guideline on whether an operation should be an IL.

To help set up a guideline, I'm raising following questions:

  1. If an operation can be optimized as much as a simple arithmetic/logic operation, should it be an IL? (sqrt, popcnt, fma*, and other math helpers i.e. trigonometric functions)
  2. If an operation behaves closely similar to a black-box call, should it be a NonHelper? (read/write barriers, atomic operations i.e. CompareAndSwap)
  3. When possible, should Optimizer try as much optimization as possible? (math functions)
  4. If Optimizer need similar amount of work optimizing one particular operation as optimizing an existing IL, what is the benefits and drawbacks of the new operation being an IL or NonHelper?
  5. If same level of code quality is warranted, why is the maintenance cost of NonHelper lower than IL?
Leonardo2718 commented 5 years ago

@ymanton commented:

These two representations are basically the same if you think about it. One carries its meaning in the opcode and the other carries its meaning in the symbol, but they basically express the exact same thing. The optimizer should be able to optimize popcnt exactly the same way whether it's an opcode or a call. Calls can be side-effect free, have constraints, be reduced to a constant, and all that fun stuff. The criteria of whether something should be expressed as an opcode or an intrinsic call has to be something other than how well it can be optimized because in theory you can teach the optimizer to optimize any equivalent representation the same way.

Here's an alternative criteria: how likely is it to map to a hardware instruction or two on most architectures? Most unary and binary logical and arithmetic ops fit, most floating point ops fit, and popcnt fits. CAS and friends don't fit and should be an intrinsics instead because many (most?) architectures don't have CAS-style atomic instructions, they have LL/SC-style atomic instructions. Some opcodes that we have now, like new and monent and a lot of the Java bytecode inspired ones really don't make sense as opcodes and make a lot more sense as intrinsics, especially since in J9 you can simply convert the opcode to a call and evaluate it and it will "just work" because the symbols for new and monent are the corresponding callable VM helpers.

Having some opcodes evaluate to 1 instruction and others to 100 instructions hints that these things are not exactly alike and shouldn't be treated the same way. Having all opcodes be instruction-like and punting everything that has more complexity than that into the intrinsics bucket seems like a decent way to categorize things.

Leonardo2718 commented 5 years ago

@ymanton also commented:

Actually, now that I think about @0dvictor mentioned something I overlooked. Our IL can express "pure calls", which are essentially side-effect free calls, but we always anchor calls, which counter-acts that and makes them fundamentally different than IL ops that can appear as children. If we stick to that convention it means that a popcnt intrinsic will never quite behave like a popcnt IL op which doesn't have or need a fixed evaluation point and would benefit from that sort of code motion. That means another criteria could be that if your operation doesn't need a fixed evaluation point then you should make it an opcode because an intrinsic can't do that for you, even if it's pure, and your operation needs a fixed evaluation point it should be an intrinsic.

If you were to combine those two critera you would end up with: opcodes should be things that are likely to be hardware instructions AND don't need a fixed evaluation point, and things that are not likely to be HW instructions or need a fixed evaluation point should be intrinsics.

When I think of examples to fit these categories I can't think of anything that would be likely to be a HW instruction but need a fixed evaluation point. Perhaps instructions that can cause exceptions, like traps and monitored loads, instructions to read timestamps and PMU counters, etc would have to be expressed as intrinsics rather than opcodes, which actually sounds reasonable now that I think of it, but then again these things probably don't need to be represented in the IL in the first place.

fjeremic commented 5 years ago

Similarly to LLVM I'm in favor of adopting the notion that OMR IL should be supported on all architectures OMR targets and intrinsic (nonhelper) symbols may be implemented only on certain architectures which can for example be reduced from some language library call.

I'm more on the conservative side here as I'm trying to avoid OMR IL bloat. To me, if an OMR IL exists I should be able to use it on all targets OMR supports. The same cannot be said for intrinsic functions as they can be exotic. Not only does this simplify the optimizer and the things that it should care about but it makes porting OMR to a new architecture much easier. i.e. take a look at AArch64 and how long it is taking to implement the evaluators for all the IL we "support".

Edit:

For example I don't think queries such as this should exist:

https://github.com/eclipse/omr/blob/9bf4244f7498c9f7901ec1218f67b0da313b804a/compiler/env/OMRCPU.hpp#L109-L112

IL should be such a primitive operation that it must be supported on all OMR targets. One should not have to query whether one can use a particular IL. This is what intrinsics should be for.

ymanton commented 5 years ago

I'm more on the conservative side here as I'm trying to avoid OMR IL bloat. To me, if an OMR IL exists I should be able to use it on all targets OMR supports.

That might be a little too conservative. What happens if you add a new platform that doesn't support support some existing opcodes? Do you change them to intrinsics for everybody? I think "has a reasonable chance of being a HW instruction" is probably an easier criteria to maintain and less likely to back you into a corner later.

0xdaryl commented 5 years ago

I'd like to move this discussion to today's architecture meeting: @0dvictor @ymanton @fjeremic @andrewcraik @gita-omr @Leonardo2718 @vijaysun-omr