Closed jdmpapin closed 4 years ago
Side note that to get full benefit of value types on Z we may need to resurrect projects like dual-TLH or evaluate the performance of doing per-object zero initialization rather than our current default which is bulk zeroing.
Thanks for the discussion in the architecture meeting (#4039) today. @vijaysun-omr will repeat some of the questions raised during the discussion in this issue so that their resolution can be documented here.
I'll ask some of the questions that came up in the discussion with @jdmpapin
What is the aliasing on the symbol reference used by the newvalue opcode ?
What is the answer to isNew() for the newvalue opcode ?
Is there a new opcode needed to represent a newvalue that gets stack allocated contiguously to reap the same benefits that we get from newvalue ?
Do we need to consider this same kind of an opcode for immutable arrays ?
You listed some of the advantages of going with the new opcode from an optimization perspective. I assume there are no functional issues with representing the same input as a new followed by stores to initialize the newly allocated object, i.e. the newvalue opcode is not needed to avoid some functional problem(s) ?
Spoke to @jdmpapin offline and tuples in Python came up as a good use-case for this; especially since Python also has classes and initialization routines. In retrospect that example really helps me imagine how this might work, how it differs from what we already have, and what the benefits would be.
- What is the aliasing on the symbol reference used by the
newvalue
opcode ?
As a GC point, its "use-def aliasing" will need to include gcSafePointSymRefNumbers()
in cases where those are required for other GC points, e.g. new
. While no state is observed, new
and other exception points have "use-only" aliasing including defaultMethodUseAliases()
, which I believe is there to account for cases in which an exception is thrown without a handler in the same frame. If so, newvalue
nodes will need the same "use-only aliasing."
These aliasing properties should not be associated with the type to allocate, so the symbol reference can't represent the type, which will have to be specified via a child. I think we should be able to use the same symbol reference as is used for new
(reference number TR_newObject
), e.g.
treetop
newvalue jitNewValue
loadaddr Vec3
fload x
fload y
fload z
In order to generate code for this, the type must be known, so this first child is constrained to be constant (loadaddr
). I am personally not a fan of such constraints, but I don't see a way around it.
- What is the answer to
isNew()
for thenewvalue
opcode ?
Likely true, especially now that it seems the first child will indicate the type to allocate. I claimed during the meeting that optimizations could only misbehave due to isNew()
because of differing tree shape, but I misspoke. Some optimizations might assume that the fields are zero/null until overwritten, and any such optimization would need to be adjusted.
- Is there a new opcode needed to represent a
newvalue
that gets stack allocated contiguously to reap the same benefits that we get fromnewvalue
?
Contiguous stack allocation cannot be done for newvalue
in the same way it is done today. To summarize for those unfamiliar, we create a "local object" symbol for which we will reserve space in the stack frame. A loadaddr
of the local object symbol produces the appropriate pointer into the stack to be used as the object reference. We replace new
with a series of indirect writes through this loadaddr
, and the loadaddr
itself stands in for new
. If we were to attempt to do this with newvalue
, we would create the very stores that newvalue
was meant to prevent. This is the crux of @vijaysun-omr's original question.
However, it is possible to do contiguous stack allocation differently for these types, if in the future we find that it is desirable to do so. We could add a second new opcode stacknewvalue
(real name TBD), which takes the local object loadaddr
as an additional child. This makes a distinction between the pointer to the storage to use for the object (loadaddr
), which is fixed in a given invocation of the compilee, and the resulting object reference (stacknewvalue
), which semantically is a fresh reference every time stacknewvalue
is evaluated. This way, it's possible to do contiguous stack allocation without introducing undesirable stores. Of course, the implementation of stacknewvalue
will still need to store to memory, just as in the case of newvalue
, but it can be lowered in much the same way.
- Do we need to consider this same kind of an opcode for immutable arrays ?
I don't anticipate an analogous opcode for arrays. The length of an array may be unknown at compile-time, in which case it's not possible to use the child-per-element approach. The creation of such an array might be side-effect free in the source code, but I believe the implementation necessarily has to rely on mutation. There may be cases that could be represented in a way similar to newvalue
, such as repeating a value n
times, or constructing an array of length 3 from values x0
, x1
, x2
, but stores into the resulting arrays will be permissible operations as long as those arrays are of the same type as in the variable-length case.
- You listed some of the advantages of going with the new opcode from an optimization perspective. I assume there are no functional issues with representing the same input as a
new
followed by stores to initialize the newly allocated object, i.e. thenewvalue
opcode is not needed to avoid some functional problem(s) ?
Correct, there is no functional reason that immutable types could not be implemented using new
and stores. The newvalue
opcode is purely to help create optimization opportunities and for ease of implementation of transformations targetting immutable objects.
Thanks, those are very satisfactory answers to my questions.
Do you anticipate newvalue
to be eligible for commoning and PRE (unlike new
for example) ? e.g. if we had a newvalue
node in a loop with all the children being invariant in the loop, could we move it out of the loop ? I think this may be possible because the newvalue
itself has no concept of identity (?)
Do you anticipate
newvalue
to be eligible for commoning and PRE (unlikenew
for example) ?
Now that you mention it, yes, I think such transformations could be applicable to newvalue
nodes that are flagged as identityless. For code motion, we'll have to think about moving the exception point. It's really only an exception in an OOM scenario, and it's probably common that a language implementation (of a sufficiently dynamic language) is free to allocate memory (and therefore introduce a possibility that allocation could fail) at just about any point. This is the same issue scalarization will face in cases where some paths may require boxing.
One more thought about EA: Something akin to stacknewvalue
could remove an existing incongruity within the compiler caused by our current contiguous stack allocation, which is that certain optimizations "know" that an object's vtable pointer is immutable, but nonetheless we sometimes mutate the vtable pointer.
tuples in Python came up as a good use-case for this
For those familiar with tuples in Python, I think that in common cases they present a good mental model in terms of what it means to create an immutable value: (foo(x), bar(y))
means to compute temporaries t0=foo(x)
, then t1=bar(y)
, then as a single final step, create the tuple (t0, t1)
. However, Python's tuples are effectively immutable arrays of potentially unknown size, e.g. tuple(i for i in range(n))
, so I doubt that newvalue
would be useful for implementing them.
Okay, I have no fundamental objections to this opcode being added given the above discussion.
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment on the issue or it will be closed in 60 days.
This opcode now exists in OMR this issue can closed.
IL Opcode Name:
newvalue
Motivation and Description
The purpose of this IL opcode is to combine the allocation and initialization of an immutable object into a single step, removing the need to separately set the value of each field. This opcode has a symbol reference specifying the type to construct, and children of variable number and types (determined by the symbol reference), similar to a call. Alternately, the type to construct could be specified using an additional first child. Because the number and types of the children depend on the type to be allocated, it is unlikely that there is use case for allowing that type to vary at runtime, though perhaps using the first child would help
newvalue
to "look like"new
.We can think of the allocation and initialization as happening atomically, so that the fields of a type whose instances are created this way are never mutated explicitly in the IL. This has benefits for optimization:
When loading a field from the result of
newvalue
, it is always possible to simply replace the load with the corresponding child ofnewvalue
.The absence of stores to fields of immutable types will benefit existing optimizations that have no knowledge of
newvalue
. For a given load, no conflicting store will be found anywhere.Zero- or default-initialization can always be skipped for these allocations.
Further, the resulting object may be "identityless" in that its identity is either unobservable or irrelevant/unreliable. This will apply to value types in Java, and it would also apply to immutable types in functional languages with weak/no notion of reference equality (e.g. OCaml?). Knowing that the result is identityless allows additional transformations, such as rematerialization. However, the
newvalue
opcode will be of broader applicability if it does not require that the result be identityless. This property could be expressed using a node flag, or an additional similar IL opcode.Alternatives
One alternative is to implement immutable objects using explicit initialization. This would fail to provide the optimization benefits mentioned above, and desirable future transformations such as scalarization would require additional analysis to find field values (even though in practice they will be close at hand whenever the allocation is).
Another alternative is to represent the
newvalue
operation as an "intrinsic call," i.e. as a call with a special symbol reference. This would force the type to be specified by a child, and the presence of a call might cause optimizations to be unnecessarily conservative. Note that the proposedaintrinsic
opcode would be inappropriate becausenewvalue
will need to be anchored.Finally, OMR could leave the representation of the allocation and initialization of immutable objects to be determined by individual downstream projects. However, immutable types are a relatively common language feature.
Homogeneity
The
newvalue
opcode has some similarities to the call family of opcodes, having children of varying number and type, but it does not read memory or mutate previously allocated memory. The only possible side effects are allowing GC to run, or responding to an out-of-memory condition, e.g. by throwing an exception.It also has some similarities to
new
. It allocates, and it is a GC point. The type of the resulting object is known exactly. If unused, the allocation can be removed.Evaluation works in the usual way. There are no interactions with the structure/syntax of the children, unless the first child is the type, in which case it might be required to be constant (
loadaddr
).In the presence of restrictions preventing internal pointers from being commoned across GC points, no child can be an internal pointer.
Structure
Data Type
Address
Children
The number and types of the children are variable, as in the call family of opcodes. Immutable types are generally "small," so the children should usually be relatively few.
The purpose of each child is to provide the initial (and only) value of the field corresponding to its position for the newly created object.
Symbol Reference
The symbol reference specifies the type of object to be allocated and initialized. Alternately, if the type is to be specified by an extra first child, there should be a generic symbol reference similar to (or possibly the same as)
jitNewValue
.IL Opcode Properties
The properties will be similar to, if not the same as, those for
new
:HasSymbolRef
MayUseSystemStack
CanRaiseException
New
It's possible that the
New
property could engage logic in the compiler that does not apply properly tonewvalue
, e.g. looking at the first child to determine the type. In that case,New
would have to be left off, or the inapplicable logic adjusted to account fornewvalue
.The
new
opcode and related opcodes (newarray
,anewarray
, etc.) also have theLikeDef
property, though I do not know why, so it is unclear whether this property also applies tonewvalue
.None.
Node Flags
One new node flag
identityless
, or perhapsanonymous
, indicating that the resulting object has no identity.Control Flow
An exception may be thrown.
Operation
An instance of the specified type is allocated (as though via
new
), and each field is initialized using the value of the corresponding child.On platforms with a weak memory model (e.g. POWER), a memory fence is required after initialization but before any write that could publish the resulting reference. Without such a fence, other threads could observe uninitialized fields.
Result Value
A reference to the freshly allocated object.
Side Effects
An exception may be thrown.
Anchoring
This operation must be anchored because it is a GC point and an exception point.
Scope
Use of this opcode is likely to create optimization opportunities of the kinds mentioned in the motivation.
The benefits of this opcode are platform-independent.
No.
Implementation Dependencies
The
newvalue
opcode has a number of implementation dependencies. Note that project-specific logic is only necessary for projects making use ofnewvalue
, not necessarily every project.Field Info
First, in order to generate, transform, or evaluate a
newvalue
node, it will be necessary to enumerate the fields of the type to be allocated, and to impose a consistent order on those fields. For a type used withnewvalue
, the compiler should know the number of fields, and for each field it should know:TR::DataType
of the field; andThis information comes from project-specific type introspection (e.g. provided via a
ClassEnv
method), but it can be used in a project-independent way by transformations eliminating loads, and to implement the initialization.Lowering
Second, lowering should be performed before evaluation to avoid effectively duplicate evaluators. Consider the following tree, creating a
Vec3
with coordinates from local variablesx
,y
, andz
.This tree can be lowered to a sequence that first allocates the object, and then initializes each field:
It is important to note that this separation does not interfere with earlier optimizations. It will result in essentially the same output code we would get by evaluating
newvalue
directly.The anchoring will often be unnecessary, as it is in this example, but in general it may be required to prevent commoning an internal pointer across
new
.Field Shadow Fabrication
To generate the initialization sequence during lowering, the compiler needs a shadow symbol reference for each field. The shadow for a field may need to be fabricated based on the field information mentioned above. These shadows will also be useful for other purposes, such as privatization.
The fabricated field shadows should be unified with any corresponding "naturally occurring" shadows, e.g. from a Java constant pool, to prevent missed opportunities. Because the naturally occurring shadows are generated in a project-specific way, OMR would need to request shadows through an extension point that can be overridden on a per-project basis.
Performance
This opcode should facilitate performance improvements by aiding optimization related to the use of immutable objects.
Testing Considerations
This opcode has multiple project-specific dependencies. Field shadow fabrication can have a default implementation in OMR that produces usable symbol references with no concern for naturally occurring shadows, but type definition and introspection would need to be mocked somehow for Tril tests, as would the implementation of
new
.IL Validation
TBD