JIT: Stop representing local addresses as TYP_BYREF

jakobbotsch commented 1 year ago

The JIT today represents local addresses as TYP_BYREF, in particular for implicit byrefs. However, after speaking to @jkotas, we believe that implicit byrefs cannot actually point to heap today. Previously this may have been the case for remoting/reflection. We should update the JIT to uniformly treat all local addresses as TYP_I_IMPL and to type implicit byrefs as TYP_I_IMPL, which should remove some GC reporting and write barriers we emit today.

We should also be able to get rid of impBashVarAddrsToI and various places that we retype local addresses as TYP_I_IMPL.

Return buffers should stay TYP_BYREF typed as the JIT itself will pass heap pointers as return buffers sometimes.

The following section in clr-abi should also be updated: https://github.com/dotnet/runtime/blob/3a3692029a2a038581fdc8c1c6496e0506a7e0c7/docs/design/coreclr/botr/clr-abi.md#amd64-only-by-value-value-types

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.

Issue Details

The JIT today represents local addresses as `TYP_BYREF`, in particular for implicit byrefs. However, after speaking to @jkotas, we believe that implicit byrefs cannot actually point to heap today. Previously this may have been the case for remoting/reflection. We should update the JIT to uniformly treat all local addresses as `TYP_I_IMPL` and to type implicit byrefs as `TYP_I_IMPL`, which should remove some GC reporting and write barriers we emit today. We should also be able to get rid of `impBashVarAddrsToI` and various places that we retype local addresses as `TYP_I_IMPL`. Return buffers should stay `TYP_BYREF` typed as the JIT itself will pass heap pointers as return buffers sometimes. The following section in clr-abi should also be updated: https://github.com/dotnet/runtime/blob/3a3692029a2a038581fdc8c1c6496e0506a7e0c7/docs/design/coreclr/botr/clr-abi.md#amd64-only-by-value-value-types

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

jakobbotsch commented 1 year ago

I have a branch where I started doing the work: https://github.com/dotnet/runtime/compare/main...jakobbotsch:runtime:type-addresses-of-locals-unmanaged?expand=1

Likely this would be easier when GT_ADDR is removed.

cc @dotnet/jit-contrib

BruceForstall commented 1 year ago

However, after speaking to @jkotas, we believe that implicit byrefs cannot actually point to heap today.

Is there a way to validate this believe? E.g., create an instrumented JIT that checks the addresses of all implicit byrefs and run our test suite?

jakobbotsch commented 1 year ago

Is there a way to validate this believe? E.g., create an instrumented JIT that checks the addresses of all implicit byrefs and run our test suite?

Yeah, seems like that wouldn't be too hard to do. It would probably be a good idea to validate it before committing too much to this.

If it isn't the case we are probably skating on thin ice, however, since Roslyn allows taking the address of parameters as unmanaged pointers. Not sure how this was reconciled in the past (maybe things were always pinned but write barriers were still needed for managed types, and TYP_BYREF is necessary for this?)

jkotas commented 1 year ago

I have looked at the history. We had number of issues around this in the original x64 port that were fixed later. I see the last set of fixes in 2009. If this text was written before 2009, it would explain where it came from.

jakobbotsch commented 1 year ago

Somewhat related is that can get some odd-typed trees today when inlining is involved:

STMT00051 ( 0x000[E-] ... ??? ) <- INL02 @ 0x000[E-] <- INLRT @ 0x0A1[E-]
               [000177] -A---------                         ▌  ASG       long  
               [000176] D------N---                         ├──▌  LCL_VAR   long   V10 tmp1         
               [000175] -----------                         └──▌  SUB       byref 
               [000173] -----------                            ├──▌  FIELD_ADDR byref  SequentialLayoutMinPacking`1[ushort]:_byte
               [000174] -----------                            │  └──▌  LCL_VAR_ADDR byref  V04 loc1         
               [000172] -----------                            └──▌  LCL_VAR_ADDR long   V04 loc1

This is produced from https://github.com/dotnet/runtime/blob/e579ccb60acb044453f2cf1907482b165658488e/src/tests/Interop/StructPacking/StructPacking.cs#L1665-L1668

as part of inlining. The fact that the result of the SUB is TYP_BYREF seems moderately dangerous. As part of retyping local addresses as TYP_I_IMPL we might need to investigate retyping inline arguments that are directly substituted in some cases.

jakobbotsch commented 1 year ago

I probably won't have time to do the full cleanup in 8.0, but we can at least get rid of write barriers for these cases.

markples commented 1 year ago

I think this is ok because of the specifics about implicit byrefs (and current optimizations), but retyping/converting GC pointers is a particularly tricky area.

Conversions between general GC pointers and integers need some restrictions on code motion because they can depend on the overall "pinned state" of the program. Pointers that are known to not point into the managed heap do not need these restrictions, but a separate change to allow better optimization of "known non-GC pointers" could accidentally create a problem with "normal" GC/int conversions. (i.e., there are subtle differences between "pointer that is known to not point into the GC heap" and "pointer instance that we don't need to specifically track at this moment in time")
(Writes to pinned locations are the other half of the above and have restrictions, but that's beyond the scope of this topic.)
In the linked abi, these is mention of the disabled optimization. Is disabling it an intended-to-be-permanent design choice, or a temporary mitigation of a problem? It's theoretically nice to be able to skip the copy into one of these temporary locations but probably limited in practice due to restrictions on ordering, introducing writes, etc. (also not sure why the optimization could only happen on return values and not incoming ones?)

jakobbotsch commented 1 year ago

There are no conversions involved here -- the implicit byrefs passed are all already unmanaged pointers. If they weren't then we had very easily exposable GC holes because C# allows taking the address of an argument as an unmanaged pointer, and the JIT doesn't do anything defensively in this situation.

In the linked abi, these is mention of the disabled optimization. Is disabling it an intended-to-be-permanent design choice, or a temporary mitigation of a problem? It's theoretically nice to be able to skip the copy into one of these temporary locations but probably limited in practice due to restrictions on ordering, introducing writes, etc. (also not sure why the optimization could only happen on return values and not incoming ones?)

Well the document is a bit confusing on the topic: the "disabled optimization" discussed seems to be more related to return buffers for reflection invoke. Even if that particular optimization was enabled the JIT would not be able to make use of it unless we also decided to disable our own optimization that passes GC pointers directly as the return buffer (see e.g. https://godbolt.org/z/P7nT17saE).

I don't think there is any problem here -- it's a trade-off. If we want to only have unmanaged ret buffer pointers then we can skip some GC reporting and write barriers, but we would not be able to do the optimization we do above (and reflection invoke would need to change). Whether or not this is the right trade-off I don't know.

also not sure why the optimization could only happen on return values and not incoming ones?

We would need to know whether the callee mutates the parameter to be able to avoid a defensive copy. And of course we would need to find a way to deal with the function taking the address of such a parameter. So it seems considerably harder than for the ret-buffer case.

markples commented 1 year ago

I definitely agree that the idea here sounds fine - just wanted to get some notes in here because it's easy to go a little bit further and get into trouble.

C# allows taking the address of an argument as an unmanaged pointer

That's certainly problematic for those advanced kinds of optimizations. I guess the copy could be on the callee side, but...

Many of these are likely interprocedural analyses (so either AOT or more sophisticated JIT analysis, probably combined with tiering) so then you can just check these various things.

dotnet / runtime

JIT: Stop representing local addresses as TYP_BYREF #82166