jdmpapin commented 5 years ago

IL Opcode Name: newvalue

Motivation and Description

The purpose of this IL opcode is to combine the allocation and initialization of an immutable object into a single step, removing the need to separately set the value of each field. This opcode has a symbol reference specifying the type to construct, and children of variable number and types (determined by the symbol reference), similar to a call. Alternately, the type to construct could be specified using an additional first child. Because the number and types of the children depend on the type to be allocated, it is unlikely that there is use case for allowing that type to vary at runtime, though perhaps using the first child would help newvalue to "look like" new.

We can think of the allocation and initialization as happening atomically, so that the fields of a type whose instances are created this way are never mutated explicitly in the IL. This has benefits for optimization:

When loading a field from the result of newvalue, it is always possible to simply replace the load with the corresponding child of newvalue.
The absence of stores to fields of immutable types will benefit existing optimizations that have no knowledge of newvalue. For a given load, no conflicting store will be found anywhere.
Zero- or default-initialization can always be skipped for these allocations.

Further, the resulting object may be "identityless" in that its identity is either unobservable or irrelevant/unreliable. This will apply to value types in Java, and it would also apply to immutable types in functional languages with weak/no notion of reference equality (e.g. OCaml?). Knowing that the result is identityless allows additional transformations, such as rematerialization. However, the newvalue opcode will be of broader applicability if it does not require that the result be identityless. This property could be expressed using a node flag, or an additional similar IL opcode.

Alternatives

One alternative is to implement immutable objects using explicit initialization. This would fail to provide the optimization benefits mentioned above, and desirable future transformations such as scalarization would require additional analysis to find field values (even though in practice they will be close at hand whenever the allocation is).

Another alternative is to represent the newvalue operation as an "intrinsic call," i.e. as a call with a special symbol reference. This would force the type to be specified by a child, and the presence of a call might cause optimizations to be unnecessarily conservative. Note that the proposed aintrinsic opcode would be inappropriate because newvalue will need to be anchored.

Finally, OMR could leave the representation of the allocation and initialization of immutable objects to be determined by individual downstream projects. However, immutable types are a relatively common language feature.

Homogeneity

Does this IL opcode have the same basic structure and operation as other IL opcodes, or does it behave differently in any significant ways? Please elaborate.

The newvalue opcode has some similarities to the call family of opcodes, having children of varying number and type, but it does not read memory or mutate previously allocated memory. The only possible side effects are allowing GC to run, or responding to an out-of-memory condition, e.g. by throwing an exception.

It also has some similarities to new. It allocates, and it is a GC point. The type of the resulting object is known exactly. If unused, the allocation can be removed.

Evaluation works in the usual way. There are no interactions with the structure/syntax of the children, unless the first child is the type, in which case it might be required to be constant (loadaddr).

In the presence of restrictions preventing internal pointers from being commoned across GC points, no child can be an internal pointer.

Structure

Data Type

Address

Children

The number and types of the children are variable, as in the call family of opcodes. Immutable types are generally "small," so the children should usually be relatively few.

The purpose of each child is to provide the initial (and only) value of the field corresponding to its position for the newly created object.

Symbol Reference

The symbol reference specifies the type of object to be allocated and initialized. Alternately, if the type is to be specified by an extra first child, there should be a generic symbol reference similar to (or possibly the same as) jitNewValue.

IL Opcode Properties

Provide the set of IL opcode properties applicable to this IL opcode.

The properties will be similar to, if not the same as, those for new:

HasSymbolRef
MayUseSystemStack
CanRaiseException
New

It's possible that the New property could engage logic in the compiler that does not apply properly to newvalue, e.g. looking at the first child to determine the type. In that case, New would have to be left off, or the inapplicable logic adjusted to account for newvalue.

The new opcode and related opcodes (newarray, anewarray, etc.) also have the LikeDef property, though I do not know why, so it is unclear whether this property also applies to newvalue.

Describe the semantics of any new IL opcode properties required for this IL opcode.

None.

Node Flags

Describe the Node flags that will be set on Nodes with this IL opcode.

One new node flag identityless, or perhaps anonymous, indicating that the resulting object has no identity.

Control Flow

An exception may be thrown.

Operation

Provide a detailed description of the operation of this IL opcode. Use pseudo-code if necessary.

An instance of the specified type is allocated (as though via new), and each field is initialized using the value of the corresponding child.

On platforms with a weak memory model (e.g. POWER), a memory fence is required after initialization but before any write that could publish the resulting reference. Without such a fence, other threads could observe uninitialized fields.

Result Value

A reference to the freshly allocated object.

Side Effects

An exception may be thrown.

Anchoring

This operation must be anchored because it is a GC point and an exception point.

Scope

How will this IL opcode be handled by the optimizer? Is there a reasonable enough chance of optimization occurring to warrant introducing a new IL opcode rather than using an intrinsic?

Use of this opcode is likely to create optimization opportunities of the kinds mentioned in the motivation.

Will this IL opcode be useful on all code generator backends in Eclipse OMR? If not, please explain.

The benefits of this opcode are platform-independent.

Does this IL opcode map to a machine instruction or require specialized hardware to support?

No.

Implementation Dependencies

Describe any implementation dependencies this IL opcode incurs. For example, does it require:

Special support to be implemented in one or more optimizations before it can be used?

All backends to implement it simultaneously?

Special backend support (e.g., an out-of-line path to handle an exceptional case)?

Implementation in all downstream projects consuming OMR?

Codegen Node lowering, and if so, to what trees?

The newvalue opcode has a number of implementation dependencies. Note that project-specific logic is only necessary for projects making use of newvalue, not necessarily every project.

Field Info

First, in order to generate, transform, or evaluate a newvalue node, it will be necessary to enumerate the fields of the type to be allocated, and to impose a consistent order on those fields. For a type used with newvalue, the compiler should know the number of fields, and for each field it should know:

the field offset (as would be used to read the field);
the TR::DataType of the field; and
for debugging, the name of the field (if applicable).

This information comes from project-specific type introspection (e.g. provided via a ClassEnv method), but it can be used in a project-independent way by transformations eliminating loads, and to implement the initialization.

Lowering

Second, lowering should be performed before evaluation to avoid effectively duplicate evaluators. Consider the following tree, creating a Vec3 with coordinates from local variables x, y, and z.

treetop
  newvalue Vec3
    fload x
    fload y
    fload z

This tree can be lowered to a sequence that first allocates the object, and then initializes each field:

treetop
  fload x
treetop
  fload y
treetop
  fload z
treetop
  new (skipZeroInit)
    loadaddr Vec3
fstorei Vec3.x
  ==>new
  ==>fload x
fstorei Vec3.y
  ==>new
  ==>fload y
fstorei Vec3.z
  ==>new
  ==>fload z
(memory fence here if required)

It is important to note that this separation does not interfere with earlier optimizations. It will result in essentially the same output code we would get by evaluating newvalue directly.

The anchoring will often be unnecessary, as it is in this example, but in general it may be required to prevent commoning an internal pointer across new.

Field Shadow Fabrication

To generate the initialization sequence during lowering, the compiler needs a shadow symbol reference for each field. The shadow for a field may need to be fabricated based on the field information mentioned above. These shadows will also be useful for other purposes, such as privatization.

The fabricated field shadows should be unified with any corresponding "naturally occurring" shadows, e.g. from a Java constant pool, to prevent missed opportunities. Because the naturally occurring shadows are generated in a project-specific way, OMR would need to request shadows through an extension point that can be overridden on a per-project basis.

Performance

This opcode should facilitate performance improvements by aiding optimization related to the use of immutable objects.

Testing Considerations

Describe how this proposed IL opcode will be tested in a project-independent manner that can be automated in Eclipse OMR CI builds. If Tril unit tests cannot be provided, please explain and propose an alternative.

This opcode has multiple project-specific dependencies. Field shadow fabrication can have a default implementation in OMR that produces usable symbol references with no concern for naturally occurring shadows, but type definition and introspection would need to be mocked somehow for Tril tests, as would the implementation of new.

IL Validation

Will any changes to the IL Validator be required to validate trees using this IL opcode and its properties or characteristics? If so, please elaborate.

TBD

fjeremic commented 5 years ago

Side note that to get full benefit of value types on Z we may need to resurrect projects like dual-TLH or evaluate the performance of doing per-object zero initialization rather than our current default which is bulk zeroing.

0xdaryl commented 5 years ago

Thanks for the discussion in the architecture meeting (#4039) today. @vijaysun-omr will repeat some of the questions raised during the discussion in this issue so that their resolution can be documented here.

vijaysun-omr commented 5 years ago

I'll ask some of the questions that came up in the discussion with @jdmpapin

What is the aliasing on the symbol reference used by the newvalue opcode ?
What is the answer to isNew() for the newvalue opcode ?
Is there a new opcode needed to represent a newvalue that gets stack allocated contiguously to reap the same benefits that we get from newvalue ?
Do we need to consider this same kind of an opcode for immutable arrays ?
You listed some of the advantages of going with the new opcode from an optimization perspective. I assume there are no functional issues with representing the same input as a new followed by stores to initialize the newly allocated object, i.e. the newvalue opcode is not needed to avoid some functional problem(s) ?

ymanton commented 5 years ago

Spoke to @jdmpapin offline and tuples in Python came up as a good use-case for this; especially since Python also has classes and initialization routines. In retrospect that example really helps me imagine how this might work, how it differs from what we already have, and what the benefits would be.

jdmpapin commented 5 years ago

What is the aliasing on the symbol reference used by the newvalue opcode ?

As a GC point, its "use-def aliasing" will need to include gcSafePointSymRefNumbers() in cases where those are required for other GC points, e.g. new. While no state is observed, new and other exception points have "use-only" aliasing including defaultMethodUseAliases(), which I believe is there to account for cases in which an exception is thrown without a handler in the same frame. If so, newvalue nodes will need the same "use-only aliasing."

These aliasing properties should not be associated with the type to allocate, so the symbol reference can't represent the type, which will have to be specified via a child. I think we should be able to use the same symbol reference as is used for new (reference number TR_newObject), e.g.

treetop
  newvalue jitNewValue
    loadaddr Vec3
    fload x
    fload y
    fload z

In order to generate code for this, the type must be known, so this first child is constrained to be constant (loadaddr). I am personally not a fan of such constraints, but I don't see a way around it.

What is the answer to isNew() for the newvalue opcode ?

Likely true, especially now that it seems the first child will indicate the type to allocate. I claimed during the meeting that optimizations could only misbehave due to isNew() because of differing tree shape, but I misspoke. Some optimizations might assume that the fields are zero/null until overwritten, and any such optimization would need to be adjusted.

Is there a new opcode needed to represent a newvalue that gets stack allocated contiguously to reap the same benefits that we get from newvalue ?

Contiguous stack allocation cannot be done for newvalue in the same way it is done today. To summarize for those unfamiliar, we create a "local object" symbol for which we will reserve space in the stack frame. A loadaddr of the local object symbol produces the appropriate pointer into the stack to be used as the object reference. We replace new with a series of indirect writes through this loadaddr, and the loadaddr itself stands in for new. If we were to attempt to do this with newvalue, we would create the very stores that newvalue was meant to prevent. This is the crux of @vijaysun-omr's original question.

However, it is possible to do contiguous stack allocation differently for these types, if in the future we find that it is desirable to do so. We could add a second new opcode stacknewvalue (real name TBD), which takes the local object loadaddr as an additional child. This makes a distinction between the pointer to the storage to use for the object (loadaddr), which is fixed in a given invocation of the compilee, and the resulting object reference (stacknewvalue), which semantically is a fresh reference every time stacknewvalue is evaluated. This way, it's possible to do contiguous stack allocation without introducing undesirable stores. Of course, the implementation of stacknewvalue will still need to store to memory, just as in the case of newvalue, but it can be lowered in much the same way.

Do we need to consider this same kind of an opcode for immutable arrays ?

I don't anticipate an analogous opcode for arrays. The length of an array may be unknown at compile-time, in which case it's not possible to use the child-per-element approach. The creation of such an array might be side-effect free in the source code, but I believe the implementation necessarily has to rely on mutation. There may be cases that could be represented in a way similar to newvalue, such as repeating a value n times, or constructing an array of length 3 from values x0, x1, x2, but stores into the resulting arrays will be permissible operations as long as those arrays are of the same type as in the variable-length case.

You listed some of the advantages of going with the new opcode from an optimization perspective. I assume there are no functional issues with representing the same input as a new followed by stores to initialize the newly allocated object, i.e. the newvalue opcode is not needed to avoid some functional problem(s) ?

Correct, there is no functional reason that immutable types could not be implemented using new and stores. The newvalue opcode is purely to help create optimization opportunities and for ease of implementation of transformations targetting immutable objects.

vijaysun-omr commented 5 years ago

Thanks, those are very satisfactory answers to my questions.

Do you anticipate newvalue to be eligible for commoning and PRE (unlike new for example) ? e.g. if we had a newvalue node in a loop with all the children being invariant in the loop, could we move it out of the loop ? I think this may be possible because the newvalue itself has no concept of identity (?)

jdmpapin commented 5 years ago

Do you anticipate newvalue to be eligible for commoning and PRE (unlike new for example) ?

Now that you mention it, yes, I think such transformations could be applicable to newvalue nodes that are flagged as identityless. For code motion, we'll have to think about moving the exception point. It's really only an exception in an OOM scenario, and it's probably common that a language implementation (of a sufficiently dynamic language) is free to allocate memory (and therefore introduce a possibility that allocation could fail) at just about any point. This is the same issue scalarization will face in cases where some paths may require boxing.

jdmpapin commented 5 years ago

One more thought about EA: Something akin to stacknewvalue could remove an existing incongruity within the compiler caused by our current contiguous stack allocation, which is that certain optimizations "know" that an object's vtable pointer is immutable, but nonetheless we sometimes mutate the vtable pointer.

jdmpapin commented 5 years ago

tuples in Python came up as a good use-case for this

For those familiar with tuples in Python, I think that in common cases they present a good mental model in terms of what it means to create an immutable value: (foo(x), bar(y)) means to compute temporaries t0=foo(x), then t1=bar(y), then as a single final step, create the tuple (t0, t1). However, Python's tuples are effectively immutable arrays of potentially unknown size, e.g. tuple(i for i in range(n)), so I doubt that newvalue would be useful for implementing them.

vijaysun-omr commented 5 years ago

Okay, I have no fundamental objections to this opcode being added given the above discussion.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment on the issue or it will be closed in 60 days.

Leonardo2718 commented 4 years ago

This opcode now exists in OMR this issue can closed.

eclipse / omr

Compiler IL Opcode Request: newvalue #4028

Motivation and Description

Alternatives

Homogeneity

Structure

Data Type

Children

Symbol Reference

IL Opcode Properties

Node Flags

Control Flow

Operation

Result Value

Side Effects

Anchoring

Scope

Implementation Dependencies

Field Info

Lowering

Field Shadow Fabrication

Performance

Testing Considerations

IL Validation