Open aviansie-ben opened 4 years ago
Could you further explain the need for NodeMemoryReference
to exist? It seems like an odd concept to me that doesn't exist elsewhere. What sort of counting or tracking with nodes and memory references need to be done?
@fjeremic The reason that nodes need to be tracked here is that some of the nodes' registers may end up being used in the TR::MemoryReference
itself, while others may have been evaluated and used prior to generation of the memory reference. The nodes whose registers are used in the memory reference must not have their reference counts decremented until after the use of the memory reference. Take the following tree for instance:
[1] aloadi
[2] aladd
[3] aload <temp 1>
[4] lload <temp 2>
Assuming that node [2] is only used here, if node [3] ends up in rX
and node [4] ends up in rY
with node [1] evaluating into rZ
, we want to generate a single ldx rZ, rX, rY
instruction to perform the load. In this case, the returned memory reference will end up referring to rX
and rY
, so the refcounts for nodes [3] and [4] must not be decremented until after the memory reference is used, while the refcount for node [2] can be decremented immediately. Because of this, we need a way to indicate to the caller what nodes need to have their refcounts decremented after the caller is done using the memory reference.
Previously, we handled this by keeping track of a "base node" and an "index node" in TR::MemoryReference
itself, but this makes no sense IMO since the vast majority of memory references do not refer to nodes and so do not need to keep track of this information. By splitting this into a separate data structure which is used only where this information needs to be kept track of, details about how a memory reference was constructed from trees can avoid leaking into TR::MemoryReference
at all.
@aviansie-ben check ArtificiallyInflateReferenceCountWhenNecessary
on Z. This is handled transparently [1] by artificially incrementing the reference count of nodes which would die otherwise if decremented by evaluators before the use of the memory reference. The data is kept on a stack during evaluation. It avoids the user having to do anything and they don't have to worry about node reference counts when constructing memory references.
@fjeremic I see how that can resolve the problem, but I'm not sure I like that as a solution. Artificially increasing the reference count and then scheduling it to be decremented later seems like a bit of a hack to me and requires quite a bit of additional bookkeeping, although I have to admit it may be better from the evaluator's point of view to not have to worry about stuff like this.
That being said, I don't think the benefits of a solution like that really apply once an interface like LoadStoreHandler
is put into place. The whole point of this new interface is to make it unnecessary for evaluators to construct a TR::MemoryReference
referring to the address of a load/store node themselves in the first place, as doing so is inherently dangerous due to the aforementioned issues with volatile variable accesses. The NodeMemoryReference
class that's being added here is completely internal to LoadStoreHandler
/LoadStoreHandlerImpl
, so the evaluators already don't have to worry about the node reference counts themselves.
make it unnecessary for evaluators to construct a
TR::MemoryReference
referring to the address of a load/store node themselves in the first place
Will we be enforcing this by removing such constructors/factory methods on TR::MemoryReference
to make sure the user isn't even able to call the APIs which construct memory references by passing in a root load or store?
Will we be enforcing this by removing such constructors/factory methods on
TR::MemoryReference
to make sure the user isn't even able to call the APIs which construct memory references by passing in a root load or store?
Sorry, I should have been more clear about this in my original proposal. The eventual goal is to remove these APIs on MemoryReference
and in doing so both prevent misuse in evaluators and completely decouple MemoryReference
from tree evaluation. The initial PR will not do this and will instead just have the generateMemoryReference
function call the old APIs, but all of this code would be moved to be internal to LoadStoreHandlerImpl
once OpenJ9 is moved to the new API.
Awesome. I support your efforts then! I'll take a look once the dust settles on your eventual PR and if all looks well we can migrate the Z codegen to a similar solution as I do agree the artificial mucking with reference counts is not the greatest solution.
@gita-omr @zl-wang Just a quick FYI on this, since it's aiming to fix and prevent a family of volatility related bugs I've found in the last week or so where the lwsync
/isync
barrier after a volatile load is missing when certain codegen optimizations are triggered. Specifically, these are locations like [1] where a load underneath another operation is optimized for without checking whether the symref of the load is volatile. I can confirm that it's possible to trigger this bug under certain conditions, though it can be difficult to get the trees to look exactly right for the bad optimizations to actually trigger (the volatile load must swing down and only be referenced in the one problematic tree), so this could be causing some difficult-to-reproduce race condition bugs.
@aviansie-ben could you please add @zl-wang and myself as reviewers? We would certainly like to take a look and discuss.
@gita-omr There is no PR associated with this at the moment. This is simply an issue for discussing high-level design of the new API for performing these sorts of optimizations in a safe manner. I hope to have a WIP PR open by the end of the day on Monday, and I'll certainly add you as reviewers on that PR once it's open.
I've now opened #5652 as a WIP PR for the OMR side of this work. These changes still need more testing and there are still many areas in OpenJ9 that need to be updated to use the new API.
@aviansie-ben would it be possible to add examples of how the new interface would be used in (1) the basic case (2) the case in https://github.com/eclipse/omr/issues/5630#issuecomment-719061478
Nvm, found examples in the PR.
Currently, the generation of load and store sequences is deeply entangled with
TR::MemoryReference
and has code copied in numerous different locations. Specifically, we are currently able to generate a memory reference for an arbitrary load/store node from any evaluator and then emit instructions manually for performing the load/store. For instance, loads on Power are implemented like the following (simplified for convenience):However, code for emitting loads in particular is copied all over the codegen, as it is common to optimize when a child is a single-reference load by changing the opcode used to perform the load (e.g. an
ibyteswap
of aniload
can be done in a singlelwbrx
instruction). This has resulted in a number of bugs and design issues that have proliferated uncontrolled throughout the Power codegen:lwsync
instruction for volatile loads, which can result in nondeterministic and unreproducible bugs (I've found at least a dozen locations that have this problem so far)TR::MemoryReference
to keep track of nodes whose reference counts need to be decremented, unnecessarily coupling that class to tree evaluationTR::MemoryReference
to be tightly coupled to how unresolved symbol references work in OpenJ9TR::MemoryReference
, making that class unnecessarily complicated when it should be quite simple in theoryTo address these issues, I'd like to propose adding a new extensible class called
LoadStoreHandler
to the Power codegen that will handle these operations in a safe and controlled manner (this is a slightly simplified version to eliminate some unnecessary implementation details regarding 64-bit loads/stores on 32-bit systems):The idea here is to provide a simple interface for the evaluators to use that makes it much harder to use incorrectly and hides unnecessary details from the user. For most use cases, a simple call to
generateLoadNodeSequence
orgenerateStoreNodeSequence
can be used to encompass the entire operation of loading/storing with a particular opcode. This makes it completely impossible to forget to emit memory barriers when required, as the fact that this is even happening at all is hidden from the user entirely.For more advanced use cases,
generateComputeAddressSequence
can be used to compute the effective address of a load/store into a register. This register can later be passed togenerateLoadAddressSequence
orgenerateStoreAddressSequence
to actually produce the load/store and corresponding barriers. While potentially more error-prone, this is required for cases where special code (e.g. write barrier handling) must run between when the child nodes are evaluated and when the memory operation is actually performed.In order to make extension of these methods by downstream projects easier without needing to copy the OMR definitions of the methods on
LoadStoreHelper
, the OMR implementations would delegate to some methods from another extensible class that defines internal implementation details and isn't meant to be used directly in tree evaluators:Doing this has a couple of notable benefits. For starters, generation of the sequences for loads and stores is centralized into one function which uses an arbitrary
TR::MemoryReference
without needing to expose that detail to the evaluators. This allows both the address-based and node-based methods to be overridden by a downstream project simultaneously without requiring such code to be duplicated and allows downstream projects' implementations of these methods to delegate to the OMR versions for simple cases without needing to copy code.Furthermore, the introduction of
NodeMemoryReference
as an abstract representation of a memory reference alongside a number of nodes whose registers are being used in the memory reference allowsTR::MemoryReference
to no longer have to worry about keeping track of nodes itself.For now, these functions would all be implemented by delegating to the existing methods on
TR::MemoryReference
that we already use in order to avoid breaking changes for the time being. However, significant amounts of code could be moved out ofTR::MemoryReference
after an interface like this is introduced without breaking evaluators.