Do block predecessor lists need to be sorted?

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.17k stars 4.72k forks source link

Do block predecessor lists need to be sorted? #99730

Open BruceForstall opened 7 months ago

BruceForstall commented 7 months ago

fgAddRefPred is careful to insert predecessor edges in increasing edge source block order. The comments say (an excerpt): "This allows us to discover the loops in optFindNaturalLoops from innermost to outermost."

optFindNaturalLoops no longer depends on block number orders.

Can we stop sorting the predecessor lists? Or are there other clients who assume a sorted order?

@dotnet/jit-contrib

dotnet-policy-service[bot] commented 7 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

AndyAyersMS commented 7 months ago

I think @amanasifkhalid looked at this recently, so he should chime in.

We could probably relax this requirement; however it leads to a bit of uncontrolled behavior in some optimizations (pred list order reflects who last messed with the list, and many things walk the pred lists -- similar perhaps to our ambident dependence on bbNum) and some potential efficiency losses (we must search the entire list for dups). I don't recall if these losses are paid back by the gains from not having to sort, the lists are typically small so searching and sorting should be fairly cheap, except when the lists grow large.

I played around with using bbID for the sorting key, so we would not have to resort after renumbering, but I don't recall the results and didn't copy off the CI data: https://github.com/dotnet/runtime/pull/93636.

amanasifkhalid commented 7 months ago

We could probably relax this requirement; however it leads to a bit of uncontrolled behavior in some optimizations

I revived my work for this locally, and it indeed does have some pretty large diffs -- I've shared the win-x64 ones below. While most of the time the order of predecessor edges iterated doesn't matter, in some cases (e.g. fgHeadTailMerge) the ordering affects behavior. However, TP improvements from not having to maintain ordering are about -0.2% to -0.25% on win-x64.

win-x64 diffs

jakobbotsch commented 7 months ago

We've also seen this method (sorting the edges) at the top of some ETW traces, so the cost of it may actually be higher than what SPMI throughput reveals. OTOH I think it's beneficial/important to ensure that we have consistent iteration orders of things so that you're not subjected to spurious diffs from seemingly unrelated changes (e.g. if I do FG change A and then FG change B, it should ideally be the case that I have no diffs if I swap their order, and sorting like this should ensure that).

amanasifkhalid commented 7 months ago

OTOH I think it's beneficial/important to ensure that we have consistent iteration orders of things so that you're not subjected to spurious diffs from seemingly unrelated changes

We could try tracking down the optimization passes sensitive to pred list ordering, and sort pred lists beforehand, but this would probably negate any TP savings we get from removing the sorting invariant, and worsen code maintainability. As for MinOpts, I imagine the TP gains aren't that significant since we already take the initializingPreds=true fast path in fgAddRefPred when initially creating the flowgraph, and we probably don't modify the flowgraph all that much after.