[R2R] Drop inlining observations when inlining is not performed

jkotas commented 11 months ago

Reported by internal Microsoft team. Internal link: Watson crash from hitting breakpoint?

Observed behavior

R2R images created with --opt-cross-module:* option have spurious int3 instructions.

Analysis of the problem

Inlining in RyuJIT is a two-step process:

Creating observations about the inline
Actual inlining

We record the dependency on the exact version of the inlined method body only when we perform the actual inlining. I do not see any code for recording dependency on observations.

The problem is that the optimizations can use results from observations even when the inlining is not performed. The observation causing problem here is CALLEE_DOES_NOT_RETURN that makes codegen to think that the code after the method is unreachable and replace it with int3. RyuJIT should drop observations like this for methods outside the current version bubble.

ghost commented 11 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details

Reported by internal Microsoft team. Internal link: [Watson crash from hitting breakpoint?](https://watsonportal.microsoft.com/Failure?FailureSearchText=4da65c29-ed31-33e0-acb6-11c6b62ec228) ## Observed behavior R2R images created with `--opt-cross-module:*` option have spurious `int3` instructions. ## Analysis of the problem Inlining in RyuJIT is a two-step process: - Creating observations about the inline - Actual inlining We record the dependency on the exact version of the inlined method body only when we perform the actual inlining. I do not see any code for recording dependency on observations. The problem is that the optimizations can use results from observations even when the inlining is not performed. The observation causing problem here is [CALLEE_DOES_NOT_RETURN](https://github.com/search?q=repo%3Adotnet%2Fruntime%20CALLEE_DOES_NOT_RETURN&type=code) that makes codegen to think that the code after the method is unreachable and [replace it with int3](https://github.com/dotnet/runtime/blob/f5d53478febc328c42e4efcb43d0646d9be176cd/src/coreclr/jit/codegenlinear.cpp#L690-L693). RyuJIT should drop observations like this for methods outside the current version bubble.

Author:	jkotas
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

AndyAyersMS commented 11 months ago

Thanks for tracking this down. The JIT has had this behavior dating back to .NET Core 2.0 or so. This could cause problems even without R2R, if a profiler driven rejit changed a method that always threw into one that didn't always throw. Presumably not something people do very often. Do you think we need to fix this when jitting as well?

I don't think we expose the cross-version-bubble info to the jit yet...?

jkotas commented 11 months ago

I think we want to introduce a new JIT/EE API that the JIT will use to report and confirm attempt to use inlining observations that affect behavior. notifyInstructionSetUsage is an existing API that follows similar pattern.

The EE side can tell the JIT:

Yes, you can depend on observations about this method. The EE side needs to track this dependency same way as it tracks dependencies on actually inlined methods in this case. This option may be better for the Tier1 at runtime.
No, you cannot depend on observations about this method. It is still ok for the JIT to use the observation in a way that does not affect behavior - for example, mark the block as cold instead of unreachable. This option may be better for the speculative R2R inlining outside current version bubble. The code may be a bit less efficient, but we will need to do less tracking and less validation before using it.

If this sounds overengineered to you, we can do just the first one, ie always treat the dependency on observations same way as inlining.

EgorBo commented 10 months ago

@jkotas Looks like no-return is the only observation that can cause problems.

Should we call it something like bool canRelyOnObservations(method) or bool canAssumeMethodWontChange(method)?

Presumably, the same issue can happen with some hot-reload/ReJIT? if we change IL for a method that was recognized as never-return already.

jkotas commented 10 months ago

Should we call it something like bool canRelyOnObservations(method) or bool canAssumeMethodWontChange(method)?

I would expect canXXX method to be side-effect free. If you are going with my proposal that the EE side is going to do bookkeeping (e.g. similar to existing AddInlining in reportInliningDecision), I think the method name should imply that it is not side-effect free.

I think a good name can be something like notifyMethodInfoUsage (prior art - notifyInstructionSetUsage). Alternatively, we can overload the existing reportInliningDecision - but that makes the contract of reportInliningDecision even more complicated. I think a new dedicated method would be better.

Presumably, the same issue can happen with some hot-reload/ReJIT?

Correct. The new JIT/EE method should be implemented in both runtime JIT and R2R compiler.

EgorBo commented 10 months ago

I think a good name can be something like notifyMethodInfoUsage

Should we stop inlining/IL inspecting if it returns false? Or inlining is fine, only observations should be discarded?

jkotas commented 10 months ago

Inlining and observing is fine, just the observation cannot be used for anything that affects observable behavior. For example, the JIT can assume that the method is probably not going to return, but it cannot assume that the method is guaranteed no return.

dotnet / runtime

[R2R] Drop inlining observations when inlining is not performed #95815

Observed behavior

Analysis of the problem