Remove or keep `TR::ipopcnt` and `TR::lpopcnt` IL

NigelYiboYu commented 6 years ago

This issue is opened to ask for suggestions as to whether ipopcnt and lpopcnt IL opCodes are worth keeping.

It recently came to our attention that these two simply get no optimization and are passed right to the code generator for evaluation. This makes them seemingly redundant.

Keeping ipopcnt and lpopcnt means:

more effort is required to maintain and support them. e..g. more IL testing
(probably slightly) more optimization oppourtunities because we can constant-fold bit counts and open up opportunities for other optimizations to happen. (Although, one can argue that simple operations like these wont' make too much performance difference)
potential benefit to other languages/runtimes that employ OMR. e.g. python's int.bit_length()

I wonder what others think about these two. shall we remove them or keep them as is?

NigelYiboYu commented 6 years ago

@fjeremic FYI

NigelYiboYu commented 6 years ago

@0dvictor @andrewcraik

fjeremic commented 6 years ago

Similarly to #2997 I'm in favor of deprecating this IL. It has no testing, it's not documented or well defined and the IL flows straight down to the codegen where it is not implemented on all platforms. It breaks pretty much every rule in the real of "should this operation be an IL opcode".

(probably slightly) more optimization oppourtunities because we can constant-fold bit counts and open up opportunities for other optimizations to happen. (Although, one can argue that simple operations like these wont' make too much performance difference)

IMO the constant folding point can be said about so many operations. This doesn't seem like enough of an incentive to define a whole IL for a trivial simplification. Consider a similar example of a language that had a method numberOfTrailingZeros(int value) which returns the number trailing 0 bits in value. Should we define an IL for this operation? Surely if value was a constant we can fold away the entire call!

How many such APIs fit this criteria? Why don't we define them all as IL then? Can you see the problem with this approach?

potential benefit to other languages/runtimes that employ OMR. e.g. python's int.bit_length()

Defining population count as a codegen non-helper symbol also achieves this. See #2958 as an example where multiple languages can use atomic codegen symbols without them being an IL. I see no benefit of the IL when the codegen symbols serve the same purpose.

@0xdaryl thoughts?

0dvictor commented 6 years ago

IMO, popcnt has some fundamental differences from the atomic operations; therefore, arguments from #2997 may not apply. Behavior of popcnt is same as other unary arithmetic/log operations such as neg and not. They are side-effect free, so that may be re-ordered or removed by Optimizer. Therefore popcnt has quite different constraint from call, while atomic operations share a lot similarities with call.

I totally agree @fjeremic 's point about constant folding itself doesn't make it qualify for an IL Op; however, Optimizer can potentially do much more. Similar to other unary op like neg, not, sqrt, etc. Optimizer can in theory do at least following transformations:

Change the evaluation order;
Constant folding;
VP: 3.1 If the child has no constraint, popcnt has a constraint [0, 31] or [0, 63]; 3.2 If the child is not zero, the constraint of popcnt is [1, 31] or [1, 63], and popcnt is also non-zero; 3.3 If the child has constraint [a, b] and a >= 0, popcnt has a constraint [1, popcnt(next_power_of_2(b)-1)]; 3.4 There may be other rules that I cannot think of at the moment;
popcnt(not(a)) <==> 31or63 - popcnt(a);
popcnt(a << n) <==> popcnt(a) when a<<n does not over flow;
popcnt(rotate(a,n)) <==> popcnt(a);
if (popcnt(a) == 0) <==> if (a==0);
Other patterns that may be transformed.

These are pretty much what Optimizer can do for other arithmetic/logic operations; therefore I think popcnt deserves its own IL opcode. So do trailing/leading-zero count. OMR also have IL op for them as well.

I disagree with the argument that it should not be an IL Op because of lack of documentation or test. Unfortunately most of if not all of OMR's IL Op are not currently well-documented. There is an on-going effort (#2569) to complete the documentation and tests. If an operation qualifies for being an IL op, documentation and tests can always be added. On the other hand, being a NonHelper should not mean looser requirement on documentation or tests.

In short, I prefer to keep popcnt as an IL Op.

fjeremic commented 6 years ago

Change the evaluation order;

Constant folding;

VP: 3.1 If the child has no constraint, popcnt has a constraint [0, 31] or [0, 63]; 3.2 If the child is not zero, the constraint of popcnt is [1, 31] or [1, 63], and popcnt is also non-zero; 3.3 If the child has constraint [a, b] and a >= 0, popcnt has a constraint [1, popcnt(next_power_of_2(b)-1)]; 3.4 There may be other rules that I cannot think of at the moment;

popcnt(not(a)) <==> 31or63 - popcnt(a);

popcnt(a << n) <==> popcnt(a) when a<<n does not over flow;

popcnt(rotate(a,n)) <==> popcnt(a);

if (popcnt(a) == 0) <==> if (a==0);

Other patterns that may be transformed.

Though these are compelling arguments I would vouch to say they can apply to so many operations I can think of and as such would also be candidates for IL by this definition. If we go down this road our set of IL would grow to unmaintainable proportions I'm afraid.

As an example of a framework similar to OMR let's take a look at LLVM [1]. It defines LLVM instructions which are much like our IL opcodes in OMR, and LLVM intrinsic functions which are call like operations much like our codegen non-helper symbols. Taking a closer look, LLVM defines population count as an intrinsic function [2]. According to official documentation and guidelines intrinsic functions are preferred for operations which can be expressed as a function call and are transparent to optimization passes, similarly to our codegen non-helper symbols [3]:

Adding an intrinsic function is far easier than adding an instruction, and is transparent to optimization passes. If your added functionality can be expressed as a function call, an intrinsic function is the method of choice for LLVM extension.

In fact if you take a look at the full set of LLVM intrinsic functions you'll find that effectively all of them fit the 8 points you listed above. There are over 50 of them. With the number of datatypes OMR supports that's well over 400 new IL opcodes which can be added. [Once typed vector IL gets added its in the thousands really!] Do you think that's a bit too much? I certainly do.

[1] http://llvm.org/docs/LangRef.html [2] http://llvm.org/docs/LangRef.html#llvm-ctpop-intrinsic [3] https://llvm.org/docs/ExtendingLLVM.html

fjeremic commented 6 years ago

Given the above discussion it seems to me that we need to come up with some sort of a guideline, inspired by LLVM, on when some operation should be made into an IL. IMO the sooner we define this the better otherwise we are going to have a hard time adding or removing IL from OMR.

Perhaps this is a good topic to discuss on the next compiler architecture meeting, or in #2569 which seems more appropriate. However we can continue the conversation here as well. Tagging several other people to start collecting some thoughts:

@vijaysun-omr @mstoodle @charliegracie @andrewcraik @jdmpapin @Leonardo2718 @ymanton @0xdaryl @mpirvu @gita-omr @xliang6

0dvictor commented 6 years ago

I second the idea to set up a guideline on whether an operation should be an IL.

To help set up a guideline, I'm raising following questions:

If an operation can be optimized as much as a simple arithmetic/logic operation, should it be an IL? (sqrt, popcnt, fma*, and other math helpers i.e. trigonometric functions)
If an operation behaves closely similar to a black-box call, should it be a NonHelper? (read/write barriers, atomic operations i.e. CompareAndSwap)
When possible, should Optimizer try as much optimization as possible? (math functions)
If Optimizer need similar amount of work optimizing one particular operation as optimizing an existing IL, what is the benefits and drawbacks of the new operation being an IL or NonHelper?
If same level of code quality is warranted, why is the maintenance cost of NonHelper lower than IL?

ymanton commented 6 years ago

These two representations are basically the same if you think about it. One carries its meaning in the opcode and the other carries its meaning in the symbol, but they basically express the exact same thing. The optimizer should be able to optimize popcnt exactly the same way whether it's an opcode or a call. Calls can be side-effect free, have constraints, be reduced to a constant, and all that fun stuff. The criteria of whether something should be expressed as an opcode or an intrinsic call has to be something other than how well it can be optimized because in theory you can teach the optimizer to optimize any equivalent representation the same way.

Here's an alternative criteria: how likely is it to map to a hardware instruction or two on most architectures? Most unary and binary logical and arithmetic ops fit, most floating point ops fit, and popcnt fits. CAS and friends don't fit and should be an intrinsics instead because many (most?) architectures don't have CAS-style atomic instructions, they have LL/SC-style atomic instructions. Some opcodes that we have now, like new and monent and a lot of the Java bytecode inspired ones really don't make sense as opcodes and make a lot more sense as intrinsics, especially since in J9 you can simply convert the opcode to a call and evaluate it and it will "just work" because the symbols for new and monent are the corresponding callable VM helpers.

Having some opcodes evaluate to 1 instruction and others to 100 instructions hints that these things are not exactly alike and shouldn't be treated the same way. Having all opcodes be instruction-like and punting everything that has more complexity than that into the intrinsics bucket seems like a decent way to categorize things.

ymanton commented 6 years ago

Actually, now that I think about @0dvictor mentioned something I overlooked. Our IL can express "pure calls", which are essentially side-effect free calls, but we always anchor calls, which counter-acts that and makes them fundamentally different than IL ops that can appear as children. If we stick to that convention it means that a popcnt intrinsic will never quite behave like a popcnt IL op which doesn't have or need a fixed evaluation point and would benefit from that sort of code motion. That means another criteria could be that if your operation doesn't need a fixed evaluation point then you should make it an opcode because an intrinsic can't do that for you, even if it's pure, and your operation needs a fixed evaluation point it should be an intrinsic.

If you were to combine those two critera you would end up with: opcodes should be things that are likely to be hardware instructions AND don't need a fixed evaluation point, and things that are not likely to be HW instructions or need a fixed evaluation point should be intrinsics.

When I think of examples to fit these categories I can't think of anything that would be likely to be a HW instruction but need a fixed evaluation point. Perhaps instructions that can cause exceptions, like traps and monitored loads, instructions to read timestamps and PMU counters, etc would have to be expressed as intrinsics rather than opcodes, which actually sounds reasonable now that I think of it, but then again these things probably don't need to be represented in the IL in the first place.

Leonardo2718 commented 6 years ago

Forking the discussion to #3051 so this issue can focus on popcnt.

fjeremic commented 6 years ago

Here's an alternative criteria: how likely is it to map to a hardware instruction or two on most architectures?

What do we define as most architectures here? For example on z/Architecture we cannot reduce popcnt to an instruction at the moment. This means to support this IL we would have a moderately complex evaluator.

Another question to ask is how many languages out there support a population count type operation? Do any languages out there support it in the syntax? Is population count mostly defined as a library function in languages that do support the operation? Are library type functions prime candidates for intrinsics?

andrewcraik commented 6 years ago

I don't know that languages exposing a feature is necessary for an opcode - popcnt can be very useful for some kinds of optimization so that might be a reason for it to be an opcode even if languages don't make it readily available. I'm not saying it should be, but I don't know that lack of language usage would preclude us making an opcode for something sufficiently useful and generic.

I think one aspect of this discussion has to be code cleanliness. Looking for intrinsic/nonhelper calls is messy in general - you need to check for a call, check the symref and symbol etc. An opcode check is quite a lot simpler. Consider if add was a nonhelper - the code in simplifier would be massively more verbose and complicated. Opcodes also have a number of properties that a nonhelper call would not have automatically so if opcode-type properties are necessary then that would be a reason to make an opcode (eg for it to participate in a family of related opcodes and respond to opcode capability based queries).

Just a few more thoughts to include in the discussion :)

gita-omr commented 6 years ago

Our IL can express "pure calls", which are essentially side-effect free calls, but we always anchor calls, which counter-acts that and makes them fundamentally different than IL ops that can appear as children.

If we do this for all 'pure' calls it's quite a limitation. Then we need to introduce another flag indicating that a call can be a child.

andrewcraik commented 6 years ago

Pure calls at present are still calls and have to be anchored (I wouldn't expect that to change since it would make validation a nightmare), but it does affect the aliasing etc.

gita-omr commented 6 years ago

Then I would say it's a strong argument towards having an opcode.

andrewcraik commented 6 years ago

It depends on what we need to do with it - if these operations are rare, limited optimizations apply etc then a nonhelper could be the right thing to do since having lots of rarely used opcodes isn't great for maintainability and usability IMO.

ymanton commented 6 years ago

What do we define as most architectures here? For example on z/Architecture we cannot reduce popcnt to an instruction at the moment. This means to support this IL we would have a moderately complex evaluator.

A popcnt evaluator would probably look a lot like a codegen routine that inlined a popcnt intrinsic, you'd end up generating the same sequence in either case.

Another question to ask is how many languages out there support a population count type operation? Do any languages out there support it in the syntax? Is population count mostly defined as a library function in languages that do support the operation? Are library type functions prime candidates for intrinsics?

It's most likely a built-in or library function. I don't know of any language that has an operator for it. But I'd agree with Andrew that that's not really a satisfying criteria. Some languages have a sqrt or pow operator and some have library functions, how would a HW sqrt instruction be represented then?

0xdaryl commented 6 years ago

I'd like to move this discussion to today's architecture meeting: @0dvictor @ymanton @fjeremic @andrewcraik @gita-omr @Leonardo2718 @NigelYiboYu @vijaysun-omr

eclipse-omr / omr

Remove or keep `TR::ipopcnt` and `TR::lpopcnt` IL #3049