Closed jaredpar closed 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
The example is equivalent to this:
object objRef = { ... };
ref object objRefRef = ref objRef;
Console.WriteLine(objRefRef);
Both cases either result in objRef
being address-exposed and reported live for the whole method, or the Jit is able to eliminate the objRefRef
part and use objRef
directly (reporting liveness precisely).
So, yes, it is safe.
Practically, the Jit is super pessimistic about TypedReference
s currently, so the original example will always result in the local being address-exposed.
Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.
Author: | jaredpar |
---|---|
Assignees: | - |
Labels: | `question`, `area-CodeGen-coreclr` |
Milestone: | - |
The example is equivalent to this:
Based on conversations with others I worry this is not the case. The case you mentioned uses a reference type which means the value is on the heap, there is an object header and the GC can get full type information. In the case of a struct
local it's laid out on the stack, there is no object header and it's questionable if the untyped ref
is sufficient to root all of its fields. That is essentially the core of the question here I wanted to get confirmation on.
Effectively other members of the dotnet/runtime team have claimed this is unsafe. There are responses in this thread that it is safe. Really need clarity on which is the case here.
It would also be beneficial to understand why it is safe to help reason about other cases. Is the guarantee that a ref
the interior of a struct
is sufficient to keep the entire contents GC rooted even if it's on the stack? Do all our runtimes hold that guarantee? Etc ...
Practically, the Jit is super pessimistic about TypedReferences currently, so the original example will always result in the local being address-exposed.
@SingleAccretion I think the question is not what the JIT does today but what the runtime/GC require.
That is, would it be considered invalid for a sufficiently advanced compiler (say Mono LLVM or the Unity Burst compiler) to do the requisite analysis and determine that a ref
to s.Age
(for some local S s
) can never access s.Field
and therefore s.Field
is "dead".
What about for cases such as MemoryMarshal.CreateSpan(ref Unsafe.As<int, byte>(ref s.Age), Unsafe.SizeOf<S>())
or even MemoryMarshal.CreateSpan(ref Unsafe.As<S, byte>(ref s), Unsafe.SizeOf<S>())
? (ignoring other "unsafeness" here, such as auto layout causing different field ordering)
The question is basically, does the runtime require that a ref s.Field
keep the entirety of s
alive and accessible (even if s
is a local on the stack). If it does not, then this is potentially unsafe.
@jaredpar I agree with @SingleAccretion, this appears to be the same logically. I am referencing section III.4.19 for mkrefany
.
The mkrefany instruction supports the passing of dynamically typed references. ptr shall be a pointer (type &, or native int) that holds the address of a piece of data. class is the class token (a typeref, typedef or typespec; see Partition II) describing the type of ptr. mkrefany pushes a typed reference on the stack, that is an opaque descriptor of ptr and class. This instruction enables the passing of dynamically typed references as arguments. The callee can use the refanytype and refanyval instructions to retrieve the type (class) and address (ptr) respectively of the parameter.
The key phrase to me is "This instruction enables the passing of dynamically typed references as arguments", which means if that is its purpose it should be enough to extend the lifetime of the entire instance.
The question is basically, does the runtime require that a ref s.Field keep the entirety of s alive and accessible (even if s is a local on the stack). If it does not, then this is potentially unsafe.
This seems to be a different question though. The original statement was top down - does keeping a local extend all fields and the answer is "yes". The above seems to be the inverse, does referencing a field extend the entire enclosing structure, which I believe is "no".
The above seems to be the inverse, does referencing a field extend the entire enclosing structure, which I believe is "no".
Which then leads to, a TypedReference
is today just the ByReference<byte> _value;
and IntPtr _type
fields.
Does a runtime then need to special-case TypedReference
and therefore this is only safe in the context of TypedReference
? Based on what I understand from the above, this is the case and its only safe because of TypedReference
. They happen to fall out in the current JIT/runtime, but some other sufficiently advanced compiler may need to differentiate between them.
That is, a user could not define their own struct (as below) and expect the same guarantees
public ref struct MyTypedReference
{
private readonly ref byte _value;
private readonly IntPtr _type;
}
@jaredpar do you want to change the title to say "Understanding codegen behavior around ref and struct locals"? 😀
@Maoni0 sure. Not quite sure where the boundary is between codegen and GC here. If it's codegen then i'll update the title
I think the question is not what the JIT does today but what the runtime/GC require.
Yes. My answer (modulo the part about TypedReference
s being pessimized) is answering that question.
To better understand this: let's start with a very simple liveness model: everything is live everywhere. It is always correct. Any compiler can then try and "tighten" the liveness boundaries if it can determine the full set of defs and uses that occur, as long as the functional result of the program is the same, and the source IL is correct (it is in the example above).
It is inconsequential whether the ref
points to a struct or a field inside one, or to a local, what matters is that if local.Field
gets collected too early (for whichever reason), we've performed an incorrect optimization.
Whether its collected "too early" is entirely dependent on what the runtime and GC require around liveness tracking.
As far as I can tell, there is nothing that requires a regular ref
to local.Field1
keep the entirety of local
alive in which case a sufficiently advanced runtime is "free" to collect local.Field2
if the only thing that exists is a byref
to local.Field1
.
Otherwise, the runtime is effectively prohibited from doing any kind of promotion, enregistration, or other similar optimizations when a byref exists to any field of a local, even if it was otherwise able to statically prove that the byref was only ever to a single field.
And so its a question of, is this actually prohibited by the runtime and a requirement for any conforming implementation. Or, is it simply a byproduct of the current runtime implementation being "not sufficiently advanced".
-- Noting I'm also perfectly fine being wrong here. I just want to ensure the right question is answered.
For the language, it technically matters outside the scope of just our own runtime (RyuJIT
) and our own GC. It matters what Unity Burst or Mono LLVM or even some new compiler is allowed to do per the ECMA-335 spec (plus new addendums tracked by dotnet/runtime).
I just want to ensure the right question is answered.
This is my primary goal as well. I want to understand what our rules are here, particularly where they intersect C# features.
I do feel the original example I posed around TypedReference
is important because we have net7.0
scenarios that depend on that pattern being safe. If the rules don't presently allow this to be safe then I think it's important to identify that now so we can work on mitigations.
Let me try and share my understanding on this front.
ref struct S
{
public string a;
public string b;
}
S s = default;
var tr = MakeTR(ref s)
// All of 's' is valid. An address of the local was taken.
S s = default;
var tr = MakeTR(ref s.a);
// Only field a is for sure valid. There is nothing requiring b to be retained.
I don't see how TypedReference
's implementation is a concern here. If that could be clarified with a crisp example that would be helpful.
// All of 's' is valid. An address of the local was taken.
I think this is the point. You seem to be taking it for granted that taking the address of a local is sufficient to keep the local, and all its state, rooted for the duration of the method. Is that the case though and does the codegen guarantee that? If yes then I think everything we need falls out from that.
Closing the loop.
Is that the case though and does the codegen guarantee that?
Yes.
The intuition here is that when taking an address of a stack local, it is possible to pass that out of the current scope. This means that unless the JIT can guarantee that the scope taking the ref
isn't using some part of the local, it can't drop any field.
ref struct S
{
public string a;
public string b;
}
S s = default;
var tr = NonInlinableFunction(ref s)
// All of 's' must be kept because the JIT can't see into the non-inlinable assembly and confirm what is or isn't used.
S s = default;
var tr = InlinableFunction(ref s)
// Potential optimization because the JIT can see all uses of the fields.
Closing out as we verified the behavior here. Thanks everyone!
Consider the following code:
Is the access of
copy.Field
safe?The variable
local
is unused after the__makeref
call and should be eligible for collection. It's unclear if theTypedReference
instance is sufficient to GC root the entirestruct
. Consider the structure ofTypedReference
is effectively:So in this case
tr
is effectively holding aref byte
pointing to the firstbyte
ofS.Age
. If a GC occurs before the__refvalue
call is it possible thatlocal.Field
gets collected? Given that theTypedReference
in this case points to astruct
local, and not field within a reference type, it's unclear if that is sufficient to GC root the entirestruct
contents.Wanted to get a bit of clarity on this as it's come up a few times in customer chats and is related to the upcoming fast reflection work.