eclipse-omr / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
946 stars 396 forks source link

Problems transmuting TR::Nodes #2338

Open 0xdaryl opened 6 years ago

0xdaryl commented 6 years ago

TR::Nodes (Nodes) have traditionally consumed a significantly large proportion of compile-time memory (mainly due to their sheer quantity in a method, individual memory footprint, and lifetime). Great lengths have been taken to reclaim and reuse Node memory as often as possible, as well as condensing an individual Node's memory footprint through unions and flags word overloading. For example, creating node pools to facilitate recycling of node memory as nodes are no longer needed and packing flags words as dense as possible by overloading flag values between mutually exclusive Nodes.

The latter solution, while effective, has been the source of a number of subtle problems due to a common practice of transmuting Nodes from one opcode to another, often with different semantics (PR #2297 is a recent example). Historically, the problem was much worse because the opcode on a Node was simply changed via setOpCodeValue, leaving the developer to ensure the flags were correct for the new node. setopCodeValue was made a private function and a new family of functions was created to allow a new node to be "recreated" on top of an existing node that was known to be recyclable. While some progress was made on making the recreate functions useful, I don't believe that work was completed in its entirety (I will attach some historical context below). Understanding the meaning of Node flags and which ones were disjoint (and when) was a big part of that work.

This issue invites opinions and discussion on this problem, including:

0xdaryl commented 6 years ago

The following is some historical context on the intended design behind the Node::recreate function and the motivation behind it. It was introduced before the compiler technology was open sourced to avert problems caused by the common practice of simply changing the opcode value on a node without ensuring the properties were correctly set.

This background may not be entirely up-to-date in terms of the names of classes and such, but you should still be able to get the context.

Abstract

When an optimizer makes a transformation, a node is often transmuted from one opCodeValue to another.

This can lead to a variety of bugs because the properties that belonged to the original node, are implicitly inherited by the transmuted node, even if those properties have no meaning, or worse a different meaning for the new opCodeValue. To correct this, those properties that should NOT be inherited are explicitly unset.

This is the wrong way round. Properties that should be inherited should be explicitly set, and those that are not inherited should be implicitly unset.

The reason it is the wrong way round is that a Node's opcode value is mutable. This should not be the case. It should never be allowed to change a Node's opcode value. A Node is essentially an immutable object. A change like this should require the allocation of a new Node.

A Node should be made immutable, in that its opcode value cannot be modified once the node has been created.

The technique we describe here could also be applied to other properties of Nodes that should be considered immutable. For example: the intValue on an iconst node. Beyond immutable properties there are other properties that should be monotonic. For example: isNonNull sounds like a monotonic property. Once non-nullness of a node has been asserted, does it makes sense for this property to be ever removed? This design isn't really addressing monotonic properties.

Immutable Properties of a Node

Let us consider why the opCodeValue of a Node should be considered an immutable property.

Under an optimizer we are allowed to transmute an expression as long as its "abstract value" remained unchanged. To keep the expression's abstract value unchanged, we may to change several things at once. It is most often true that changing the opcode alone will change the abstract value of the expression. For example consider the transmutation (for positive integer x):

   x * 2  ---->  x << 1

this is represented as:

     *                 <<
   /   \    ---->    /    \
  x     2           x      1

The expressions above have the same abstract value, because they mean the same thing. The transmutation shown is valid. But this is only true because several things happen at once: the * becomes a << and the 2 is replaced by a 1. In general there may be lots of things that have to happen at once: many properties of the Node, and of the children. We have to consider the whole expression not just the top level Node here. You can't just change the opcode value of that node. So if the * is transmuted to a << and its children remain the same, then the expression is incorrect. x * 2 does not mean the same as x << 2. It is this temporary inconsistency that we wish to avoid because it leads to bugs. The answer is to disallow that intermediate (and incorrect) transmutation by not allowing just the node's opCodeValue to be altered. Instead the node expression must be recreated, so the transmutation of the expression is done atomically.

Transmutating a Node Expression In Place

If we were to create a new node to replace an old one, we would normally need to update all of the references to the original node. For instance, suppose in the transformation

x * (2 ** n)  ---->  x << n

we wanted to to replace imul by ishl, then we would normally need to update all of the references of the original imul node (the expression of which may be commoned) to refer to the new ishl node. Finding all these places can be expensive.

But with the introduction of the nodepool we can avoid this expense; we can allocate a new node over the old one (any node is the same size as any other), such that its address (ie index in the node pool) is unchanged. We do not need to find and update any references, and therefore this allows us to implement immutable nodes with little overhead.

Normally we would create a node like this:

TR::Node *node = TR::Node::create(...);

and create would create a new Node from the NodePool like this:

TR::Node *node = new (comp->getNodePool()) TR_Node(...);

where the node pool allocator would add a new Node entry to the nodepool table, and return a reference to that entry.

But for our purpose, we wish to have an alternative Node interface, recreate which creates a new Node over the top of the original Node:

TR::Node *recreatedNode = TR::Node::recreate(originalNode, ...);

The returned value, the address of the recreated node, should be identical to the address of the original Node passed to recreate. Then we would recreate the new Node over the old one like this (we may also need to "destroy" originalNode without deallocating its memory from the node pool):

originalIndex = originalNode->getNodePoolIndex();
TR::Node *node = new (comp->getNodePool(), originalIndex) TR_Node(...);

where we provide an alternative form of the new operator with an extra parameter, the index of the original Node that we wish to allocate on top of. In this case, instead of adding an entry to the table, the allocator just checks the index and returns the original storage, which is cleared and initialized in the normal way.

Using the new immutable recreate interface, there will need to be care taken to explicitly set those properties under re-creation node that were implicitly set under transmutation. Similarly the explicit unsetting of properties under transmutation may be removed from the codebase under re-creation.

This will lead to code that is more explicit about what it means to do, and taking the usual default properties expected under node creation, rather than the reverse which is currently the case, and which leads to unintuitive behaviour and bugs.

What Properties Should be Automatically Copied from Original Node During Transmutation

Some properties are meaningful after a transmutation. Thus dependant on the originalNode op and the transmutedNode op, there will be a set of properties, symRefs and children that may be automatically copied from the original node during the transmutation. A mapping function:

void TR_Node::copyValidProperties(TR_Node *fromNode, TR_Node *toNode);

will achieve this. To complete this function it is necessary to understand what properties are valid for a given node type.

Recreate functions that do not copy properties

When the replacement Node bears no or little relationship to the original node, little or no properties should be copied to the new node. Thus some recreate functions do not copy any properties, children or symRefs, but do recreate the node with new children and new symRefs. These are:

   static TR::Node *recreate(TR::Node *originalNode, OMR::ILOpCodes op, uint16_t numChildren, [TR::Node *first, ... ]);
   static TR::Node *recreateWithSymRef(TR::Node *originalNode, OMR::ILOpCodes op, uint16_t numChildren, uint16_t numChildArgs, TR::Node *first, ChildrenAndSymRefType... childrenAndSymRef);

Recreate functions that do copy properties

In practice though, the above type of mutation is rare. In most cases the replacement node has a close relationship to the original node. Thus the more common interface is this:

   static TR::Node *recreateAndCopyValidProperties(TR::Node *originalNode, OMR::ILOpCodes op);

The function recreateAndCopyValidProperties recreates a replacement Node with the new opcode, but with all the original properties, children and symRefs as the original as long as they may be valid for the new opcode. This is a common way of using setOpCodeValue in TR, but unlike setOpCodeValue, recreateAndCopyValidProperties will clear invalid properties, which avoids some of the issues that occurred with setOpCodeValue.

The only properties that may be cleared are those that have defensive asserts on their setters/getters. This avoids issues where code does this common pattern:

   constNode = loadNode;
   constNode->setLongInt(value);
   TR_DeprecatedNodeInterface::setOpCodeValue(constNode, OMR::lconst);

Here the load is transmuted to a const after the const value has been set; as a const value is "illegal" on a load, we would have cleared this property when the recreate (to replace setOpCodeValue) occurs. To guard against this, we need to first find where all those "illegal" setters/getters in the code (for all node properties) occurs, and re-order them, like so:

   constNode = loadNode;
   TR_DeprecatedNodeInterface::setOpCodeValue(constNode, OMR::lconst);
   constNode->setLongInt(value);

This is a painstaking task and requires detailed analysis, as some of the cases where properties are set "invalid" are actually bugs, or could be considered bugs. These bugs need to be differentiated from the above re-ordering problem. Thus to avoid having to solve all these issues before delivering the "copying valid properties" code, the initial implementation copies far more properties than we know to be valid, and properties are only cleared if the above fixup has been carried out and setter/getter asserts have been introduced for those properties.

fjeremic commented 6 years ago

whether flags words and other Node fields should be so densely packed and whether disjoint flags are appropriate

I counted a total of 125 node flags at the moment:

https://github.com/eclipse/omr/blob/e95cd1854a984b0ac3e9e50b0fa31520523231e2/compiler/il/OMRNode.hpp#L1781-L2076

Having already seen a few, there are flags which are completely unused and can be deprecated. I believe we can lower this list significantly. That said I'm in the opinion that we should not overload flag values and avoid these super subtle bugs which can take days or weeks to debug.

whether and how the recreate functions should set flags on transmuted nodes,

This becomes a non-issue with non-overloaded flags. As part of IL OpCode properties we can define the set of valid flags as a mask value which can then be used to logically AND with when transmuting nodes.

In general we have N number of nodes and M flag values. There is a total of N*N*M/2 combinations of transmutations for every flag. While there does "groups" of IL which can be transmuted, in general we should be able to transmute any node to any other node as long as the type of the IL is valid. This makes it very difficult to classify these "groups" of IL which can be transmuted as there are so many possibilities. Furthermore if node flags are overloaded you have to make sure that two overloaded flag values do not belong to the same "group". Any additions of new IL also has to address the problem of the "groups" then changing as suddenly the new IL could introduce an overlap for an overloaded flag.

I don't think this is maintainable or feasible moving forward. We currently also do not enforce which flags can be set on which IL. I believe with non-overloaded flag values we should be defining the set of flags which are valid for each and every IL and asserting if we attempt to set a flag on an IL which does not support it.

whether flags words and other Node fields should be so densely packed and whether disjoint flags are appropriate

Another option is to remove flags from nodes altogether and instead have "facilities" which keep track of node flags. This avoids any additional state being carried with a node. What I'm envisioning is instead of doing something like this:

// Sets node->_flags |= OMR::Node::nodeIsZero;
node->setIsNonZero()
...
if (node->isNonZero())
    ...

We could do

// Adds node to a list or bitvector of non-zero nodes
OMR::Facilities::IsNonZero.set(node);
...
if (OMR::Facilities::IsNonZero(node))
    ...

This completely decouples any additional state from the OMR::Node. Any optimization should be able to use whatever facility and this is easily extensible by downstream projects as they can add their own facilities. The facilities themselves just keep a bitvector or a list of which nodes have been set.

I haven't thought this through fully or have data to back up the feasibility of this but removing a field for every single node will reduce the footprint and you would then have the additional footprint for keeping track of which node has which flag set which should be 1 bit for every flag set for every node in the compilation.

So the question in terms of footprint here is what is larger?

  1. Number of nodes in a compilation * number of bits used to represent flags
  2. Number of nodes in a compilation * number of distinct flags set in the entire compilation

My bet is that 2. is smaller, so this would be a footprint savings, since for 2. if a facility is never used in a compilation there is no need to allocate any bitvector, etc.

Edit:

Having looked at some of the exotic node flags we have in the above list I have a strong belief 2. will be smaller as some of these flag values are so very specific they will hardly ever be used, yet we are wasting one bit for every node in the compilation for them.

dsouzai commented 6 years ago

This completely decouples any additional state from the OMR::Node. Any optimization should be able to use whatever facility and this is easily extensible by downstream projects as they can add their own facilities. The facilities themselves just keep a bitvector or a list of which nodes have been set.

My bet is that 2. is smaller, so this would be a footprint savings, since for 2. if a facility is never used in a compilation there is no need to allocate any bitvector, etc.

+1 on this idea.

There is one aspect worth considering though; if we still wish to use the NodePool to transmute a node, we will have to go through all the facilities in order to "zero out" flags. However, I think we can do better than query every single facility. If we have a per-compilation map of facilities, this map can be populated lazily; when transmuting a node, we only have to iterate through the facilities in the map.