[JIT] Remove BBF_NONE_QUIRK

In the process of removing BBJ_NONE from the JIT's flowgraph code, #94239 introduced BBF_NONE_QUIRK to indicate a BBJ_ALWAYS block was previously a BBJ_NONE, and that if the block's jump target is the next block (thus mimicking fall-through behavior), the JIT should behave as if this block were still a BBJ_NONE. This decision was motivated by a desire to minimize diffs in the initial PR, and avoid tricky refactors in a few cases -- in particular, removing BBJ_NONE initially caused several small behavioral changes in Compiler::fgFindInsertPoint that cascaded into larger codegen diffs due to differences in block ordering. Introducing BBF_NONE_QUIRK somewhat alleviated these diffs.

As part of our broader effort of modernizing the JIT's flowgraph code (see #93020), we should prioritize removing BBF_NONE_QUIRK as we reduce our reliance on implicit fall-through behavior. As of writing, we set this flag in dozens of places, but we only check if it is set in four instances. I will outline these places below, in order of what I perceive to be increasing difficulty of removal:

[x] We assert BBF_NONE_QUIRK is set in importer.cpp. Removing this is trivial, and since this is during importation, I think asserting block->JumpsToNext() would be equivalent.
[x] In Compiler::placeLoopAlignInstructions, we check that the flag isn't present before considering placing align instructions after the block. Removing this check doesn't produce any diffs locally, so perhaps this is safe to remove? Perhaps it would make sense to not consider a BBJ_ALWAYS to the next block to better emulate the previous behavior with BBJ_NONE, but the jump target is subject to move, so maybe we don't need any additional check here. On the other hand, we currently don't try to remove jumps to the next block if the block with the jump has alignment padding at the end, so allowing more BBJ_ALWAYS blocks to have alignment might slightly reduce the possible applications of that optimization.
[x] In Compiler::fgUpdateFlowGraph, if a block's jump target is a BBJ_ALWAYS to the next block with BBF_NONE_QUIRK set, we don't attempt fgOptimizeBranchToEmptyUnconditional, as we might be able to compact the BBJ_ALWAYS later (which is what we initially did for BBJ_NONE blocks). Removing this restriction unlocks some dramatic code size improvements, though some of these improvements are due to the JIT deciding not to clone a loop. We will have to investigate why this affects decisions around loop cloning, and whether the reduced cloning is desirable or should be fixed.
[x] In Compiler::fgFindInsertPoint, we check BBF_NONE_QUIRK as a proxy for fall-through behavior when deciding whether to insert after a specific block. I think it makes sense to try to keep BBJ_ALWAYS blocks and their jump targets contiguous if they are already, though later decisions could move the two apart and render our initial decision here useless. Now that there are far fewer moving pieces to contend with than in #94239, I'll try experimenting with this locally to see how sporadic the diffs are.

Thank you to everyone who's made an effort to remove BBF_NONE_QUIRK so far in otherwise unrelated changes (@jakobbotsch, I noticed you've removed a few instances in some of your recent work).

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

In the process of removing `BBJ_NONE` from the JIT's flowgraph code, #94239 introduced `BBF_NONE_QUIRK` to indicate a `BBJ_ALWAYS` block was previously a `BBJ_NONE`, and that if the block's jump target is the next block (thus mimicking fall-through behavior), the JIT should behave as if this block were still a `BBJ_NONE`. This decision was motivated by a desire to minimize diffs in the initial PR, and avoid tricky refactors in a few cases -- in particular, removing `BBJ_NONE` initially caused several small behavioral changes in `Compiler::fgFindInsertPoint` that cascaded into larger codegen diffs due to differences in block ordering. Introducing `BBF_NONE_QUIRK` somewhat alleviated these diffs. As part of our broader effort of modernizing the JIT's flowgraph code (see #93020), we should prioritize removing `BBF_NONE_QUIRK` as we reduce our reliance on implicit fall-through behavior. As of writing, we set this flag in dozens of places, but we only check if it is set in four instances. I will outline these places below, in order of what I perceive to be increasing difficulty of removal: * We assert `BBF_NONE_QUIRK` is set in [importer.cpp](https://github.com/dotnet/runtime/blob/fbf109b6b7f749e899b8859d0eea5ba75b9136ef/src/coreclr/jit/importer.cpp#L7620C39-L7620C55). Removing this is trivial, and since this is during importation, I think asserting `block->JumpsToNext()` would be equivalent. * In [`Compiler::placeLoopAlignInstructions`](https://github.com/dotnet/runtime/blob/fbf109b6b7f749e899b8859d0eea5ba75b9136ef/src/coreclr/jit/compiler.cpp#L5481), we check that the flag isn't present before considering placing align instructions after the block. Removing this check doesn't produce any diffs locally, so perhaps this is safe to remove? Perhaps it would make sense to not consider a `BBJ_ALWAYS` to the next block to better emulate the previous behavior with `BBJ_NONE`, but the jump target is subject to move, so maybe we don't need any additional check here. On the other hand, we currently don't try to remove jumps to the next block if the block with the jump has alignment padding at the end, so allowing more `BBJ_ALWAYS` blocks to have alignment might slightly reduce the possible applications of that optimization. * In [`Compiler::fgUpdateFlowGraph`](https://github.com/dotnet/runtime/blob/fbf109b6b7f749e899b8859d0eea5ba75b9136ef/src/coreclr/jit/fgopt.cpp#L6114), if a block's jump target is a `BBJ_ALWAYS` to the next block with `BBF_NONE_QUIRK` set, we don't attempt `fgOptimizeBranchToEmptyUnconditional`, as we might be able to compact the `BBJ_ALWAYS` later (which is what we initially did for `BBJ_NONE` blocks). Removing this restriction unlocks some dramatic code size improvements, though some of these improvements are due to the JIT deciding not to clone a loop. We will have to investigate why this affects decisions around loop cloning, and whether the reduced cloning is desirable or should be fixed. * In ['Compiler::fgFindInsertPoint](https://github.com/dotnet/runtime/blob/fbf109b6b7f749e899b8859d0eea5ba75b9136ef/src/coreclr/jit/fgbasic.cpp#L6635), we check `BBF_NONE_QUIRK` as a proxy for fall-through behavior when deciding whether to insert after a specific block. I think it makes sense to try to keep `BBJ_ALWAYS` blocks and their jump targets contiguous if they are already, though later decisions could move the two apart and render our initial decision here useless. Now that there are far fewer moving pieces to contend with than in #94239, I'll try experimenting with this locally to see how sporadic the diffs are. Thank you to everyone who's made an effort to remove `BBF_NONE_QUIRK` so far in otherwise unrelated changes (@jakobbotsch, I noticed you've removed a few instances in some of your recent work).

Author:	amanasifkhalid
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

cc @BruceForstall @AndyAyersMS

It seems that at some point in compilation, we've fixed the layout block order and after that a BBJ_ALWAYS to the next block will "fall through" and not generate a branch (except, say, for between hot/cold region boundaries). Do callers just "know" when this is true, or will we introduce something like BBF_NONE_QUIRK (say, BBF_ALWAYS_FALLTHROUGH?) to indicate that?

E.g., should bbFallsThrough return true for some BBJ_ALWAYS cases?

Do callers just "know" when this is true

We currently just manually check if the block jumps to the next block, and base our decision-making around that; post-block layout, that's a reliable check for fall-through behavior, but before we've finished creating/moving blocks, the result of that check is subject to change (though I don't think there's much we can do about that until we start removing our broader dependence on fall-through behavior when making decisions around block layout).

We could introduce some extra state into the Compiler object that tracks whether block layout is finalized or not (maybe we can try to set fgSafeBasicBlockCreation to false in a phase earlier than codegen, and use that instead?), and then check this state to determine if a BBJ_ALWAYS to the next block will still "fall through" by the time we get to codegen. I'm hesitant to introduce a new flag like BBF_ALWAYS_FALLTHROUGH since I think we can use existing state to determine if a BBJ_ALWAYS will fall through. We could modify bbFallsThrough to check this state and return true for BBJ_ALWAYS in some cases, though it looks like the majority of our calls to bbFallsThrough are done before block layout is finalized, and the BBJ_NONE removal introduced an additional check to many of these call sites to cover the BBJ_ALWAYS jump-to-next case -- e.g. if (blk->bbFallsThrough() || (blk->KindIs(BBJ_ALWAYS) && blk->JumpsToNext())).

dotnet / runtime

[JIT] Remove BBF_NONE_QUIRK #95998