Open AndyAyersMS opened 1 year ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
Author: | AndyAyersMS |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr` |
Milestone: | Future |
Do you think this was intended as some sort of indirect cost of a phi assignment? For example, it "full SSA", un-ssa requires assignments which of course has costs. Are phis indicative of register shuffling that might need to happen? To be clear, I don't really think so - the normal case would keep the variable in the same register - but certainly phis indicate some sort of complexity of the code.
We currently don't allow overlapping SSA lifetimes so there is no codegen implication for PHIs.
And if we ever did want to allow overlapping lifetimes, we'd need to fix our SSA maintenance story -- by the time we drop SSA it's no longer accurate.
I think a (possible) simpler next step would be to rename disjoint SSA webs right after we build SSA. That would give us some freedom to rearrange things in ways we can't do now, but I suspect other changes might also be needed to really see any benefit from this.
Sorry, I was trying to contrast against overlapping SSA lifetimes by saying things like "indirect cost" and "full SSA".
A concrete example would be a phi of an integer remainder (rdx) and a call return (rax). The phi is the clue that a register shuffle will be needed. Whether this is important enough to matter (it's not like we assign a cost to an integer remainder (rdx) feeding the first argument of a call (rcx)) is a different question. And the cost of the phi is in the predecessor anyway...
In the current design, all PHI stores are costed to be zero (look for where they are created in ssabuilder.cpp
). The fact we end up with this non-zero cost is accidental.
Sorry, I was trying to contrast against overlapping SSA lifetimes by saying things like "indirect cost" and "full SSA".
A concrete example would be a phi of an integer remainder (rdx) and a call return (rax). The phi is the clue that a register shuffle will be needed. Whether this is important enough to matter (it's not like we assign a cost to an integer remainder (rdx) feeding the first argument of a call (rcx)) is a different question. And the cost of the phi is in the predecessor anyway...
I guess it's fair to say that LSRA rediscovers this stuff later on in its own way... note it does a poor job in general handling cyclic constraints (not surprising given its "linear" world view....)
Was playing around with removing PhiDefinitions just after the optimization passes, but this lead to an unexpectedly large number of diffs. The culprit seems to be attribution of size cost to a PHI def store:
Leading to more frequent code duplication in
fgOptimizeBranch
.EG in benchmarks.run_pgo
On the whole this additional duplication might not be bad, as the heuristic thinks it is avoiding a branch to branch, but the duplication seems to put a lot of pressure on LSRA, leading to many more split edge resolution moves (with corresponding horrible block placements). So perhaps this optimization increases the density of critical edges or something?
At any rate attributing costs to PHIs like this is nonsensical; fixing that might actually have a broader impact as it probably feeds into decisions made during optimization as well.
My main motivation for eagerly removing the PhiDefinitions was better TP and less gunk possibly blocking optimizations, oddly my local runs show no TP benefit at all, though perhaps I mismeasured.
cc @dotnet/jit-contrib