Open ShadedBlink opened 1 month ago
This is not simply achievable as an API. For example, Memory<T>
has to use two fields to store the array and offset, instead of one ref
like Span<T>
. The GC needs to be tweaked to allow interior references from heap object. The new ref type will only be allowed to point to heap locations.
The C# syntax is the last piece for such a feature. It should first be designed for CLR type system, like #63768 and #32060 for ref field on stack.
Also note that the value would be much lower than ref
field in spans. Unlike span access on stack, heap spilling and task suspensions has much more overhead than retrieving fields.
Another approach would be a Memory<T>
-like HeapRef<T>
struct, just without the length field and MemoryManager support. Example API:
public readonly struct HeapRef<T>
{
private readonly object _object;
private readonly nint _offset;
public HeapRef(object obj, RuntimeFieldHandle field); // safe constructor
public HeapRef(T[] array, nint offset);
public HeapRef(string, nint offset);
public unsafe ref T Get() => ref Unsafe.As<byte, T>(ref Unsafe.AddByteOffset(ref Unsafe.As<RawData>(_object).Data, _offset));
}
This is not simply achievable as an API. For example, Memory
has to use two fields to store the array and offset, instead of one ref like Span . The GC needs to be tweaked to allow interior references from heap object. The new ref type will only be allowed to point to heap locations.
Yes, I understand that, that's why I posted this proposal here, not in csharplang
repo. As for GC tweaks, I believe it to be less hurtfull as ref fields were added not long ago. This proposal just extrapolates this feature on object fields. Yeah, it still may be problematic, but definitely not impossible.
Also note that the value would be much lower than ref field in spans. Unlike span access on stack, heap spilling and task suspensions has much more overhead than retrieving fields.
I believe that there should not be any sensible performance overhead for this proposal. The target object resolving behaviour should be totally equal to current ref
behaviour, because safe ref
works exactly like ref
. safe
is just a modifier that limits usage of ref
to objects. The GC seeking of alive objects should be equal to current object reference and ref
collection. GC already iterates over all reference fields, we just add ref
fields to them as well. As for calculating byte offset to reach original object, we already doing it for ref
, so there is nothing new.
Practical value of this implementation may not be as huge as ref fields in structs, but it is also not as hard to develop as ref fields in structs, because most of prerequisites are already implemented with those ref fields feature, we just need to extend and adapt them for this case.
Another approach would be a Memory
-like HeapRef struct, just without the length field and MemoryManager support.
Very good idea that I am already using in another form, and it also can be used for backwards compatibility like good old times ByRef<>
. Yet it is too rough to include in any public implementation, because you have to manually resolve and store offsets.
I believe that there should not be any sensible performance overhead for this proposal. The target object resolving behaviour should be totally equal to current
ref
behaviour, becausesafe ref
works exactly likeref
.safe
is just a modifier that limits usage ofref
to objects. The GC seeking of alive objects should be equal to current object reference andref
collection. GC already iterates over all reference fields, we just addref
fields to them as well. As for calculating byte offset to reach original object, we already doing it forref
, so there is nothing new.
No, it works different for GC. GC currently only expects ref
s from stack, not from heap itself. I can't say the effort required by GC, but it does need change.
Practical value of this implementation may not be as huge as ref fields in structs, but it is also not as hard to develop as ref fields in structs, because most of prerequisites are already implemented with those ref fields feature, we just need to extend and adapt them for this case.
They are indeed much different in area of implementations. GC was already supporting interior ref
s, so the work are mostly in type loader. In the other hand, to enable heap reference within the heap, GC has to be changed, as well as the type loader.
ref
was a long concept in the CLR type system. The ref field feature only extends where it can exist. However, the new heap reference is an entirely new concept, although it works similar to ref
. There need to be enforcement or verification to prevent stack references to be stored into heap, etc.
A new Memory<T>
like struct would really be the simpler and reliable solution. That probably why Memory<T>
was implemented with 3 fields while Span<T>
was in 2. The extra addition instruction can really be negligible.
Yet it is too rough to include in any public implementation, because you have to manually resolve and store offsets.
If we have that, we can have the C# language and JIT to incorporate to simplify it. For example, C# compiler can compile magicsyntax(obj, field1)
into new HeapRef<Field1Type>(obj, constant(FieldHandle))
, and the JIT can see the object type and constant field handle and emit the offset as a constant.
Would something along these lines be possible? It's more widely applicable (since it can work with all ref
s, even if you no longer know the object
it's a part of)?
namespace System.Runtime.InteropServices;
//or similar scary namespace as this type is not intended to be safe (this is the low-level, max performance API for it) - you could write safe wrappers also or on top yourself and do basically whatever with the below design though
//a safe one would probably need a class to control when it gets copied, etc.
//also could have a safe c# feature built on top to only allow provably safe uses (e.g., only on heap), but would be nice for it to still support unmanaged memory and null like with normal interior refs, but getting these would obviously require some amount of unsafe code & would require manual verification
/*
When a GC happens:
- If _interiorRef is not null - _interiorRef is set to null & it is converted to object + offset, or null + offset (for unmanaged pointer) when the GC happens
- Otherwise, we can just rely on looking at _object to see what object we need to keep alive
*/
struct InteriorRef<T>
{
//we could technically probably get away with just object + offset if we have a sentinel object meaning "this is an interior ref" - I've ignored that to make it simpler since it's just an optimisation
private void* _interiorRef; //gc has special knowledge about this field
private object? _object;
private nuint _offset;
public readonly ref T Reference
{
//nogc begin
var interiorRef = _interiorRef;
if (interiorRef != null) return ref *(T*)interiorRef; //we haven't converted to object + offset yet
else return Unsafe.As<byte, T>(ref Unsafe.Add(ref Unsafe.GetFieldData(_object) /* returns null ref for null object */, _offset));
//nogc end
}
public void Set(ref T reference)
{
//nogc begin
_interiorRef = Unsafe.AsPointer(ref reference); //gc will treat as an interior ref at next relevant point in time, when it converts it to object + offset
_object = null;
_offset = 0;
//nogc end
}
public void SetWithObject(ref T reference, object? o)
{
//nogc begin
_interiorRef = null;
_object = o;
_offset = Unsafe.ByteOffset(ref Unsafe.As<byte, T>(ref Unsafe.GetFieldData(o) /* null byref for null */, ref reference));
//nogc end
}
public readonly void CopyTo(out InteriorRef<T> value) //safe helper for copying, since normal copy wouldn't necessarily stop GC from running half-way in (may not be necessary if that's easy to to)
{
//nogc begin
value = this;
//nogc end
}
public readonly InteriorRef<TOther> UnsafeAs<TOther> => ...; //same idea as CopyTo - a correct method for re-interpreting the reference (while still being able to keep as object + offset if already changed to that), still an unsafe op though obviously
//other: UnsafeAddByteOffset, and any other strictly necessary APIs (to avoid making unnecessary work for GC converting interior ref to object + offset as much as possible)
}
@huoyaoyuan
No, it works different for GC. GC currently only expects refs from stack, not from heap itself. I can't say the effort required by GC, but it does need change.
Of course it will need changes, I don't object to that. I understand that it is not implemented yet, I just want to say that we have pretty similar mechanics that are already in use by GC. It already iterates over each object in heap to collect normal object references, so we can add ref
collecting there, i.e. walk over both references and ref
s. When GC encounters ref
during that iteration, it just has to apply same logic that it already uses on the stack.
They are indeed much different in area of implementations. GC was already supporting interior refs, so the work are mostly in type loader. In the other hand, to enable heap reference within the heap, GC has to be changed, as well as the type loader.
I don't know the implementation details of GC in these matters, I just see that safe ref
and object reference are pretty much similar. It's more like a safe ref
is a normal object reference with offset. Such similarity makes me think that we just have to extend normal object reference logic to support offsets. As I know, current ref
implementation is done this way, that target object is resolved through given address. If we reuse this logic in object reference resolving mechanic, it should deal with this case - this won't be a third reference type, it would be just an extension over normal object references.
There need to be enforcement or verification to prevent stack references to be stored into heap, etc.
We can just use same rules that we use for object references at the moment - any way to create or modify a safe ref
outside of normal ref obj.Field
should be treated equally to an attempt to create or modify an object reference.
If we have that, we can have the C# language and JIT to incorporate to simplify it. For example, C# compiler can compile magicsyntax(obj, field1) into new HeapRef
(obj, constant(FieldHandle)), and the JIT can see the object type and constant field handle and emit the offset as a constant.
I am totally okay with that, it looks like a temporary solution like old ByRef<>
that is transparent to users and can be later transformed to a true native safe ref
implementation without any changes to user code.
Would something along these lines be possible? It's more widely applicable (since it can work with all refs, even if you no longer know the object it's a part of)?
Actually I see no difference from @huoyaoyuan proposal of HeapRef<>
. The problem that I see in this approach is not in performance, but in usability. You have to pass both object and field and also have to make sure that they are connected. And manual construction of a custom type when you just need a reference to a damned int
looks even more sad.
Yeah, we need a C# support for safe ref
atleast. It won't be as ugly if these InteriorRef<>
, HeapRef<>
or SafeRef<>
won't be directly mentioned in user code.
If we have that, we can have the C# language and JIT to incorporate to simplify it. For example, C# compiler can compile magicsyntax(obj, field1) into new HeapRef
(obj, constant(FieldHandle)), and the JIT can see the object type and constant field handle and emit the offset as a constant.
So for this approach we have to add a HeapRef<>
to dotnet API and add a safe ref
concept to C#, i.e. when argument or field is declared as safe ref
, it is compiled as an instance of HeapRef<>
. C# then should insert conversions between HeapRef<>
and ref
.
safe ref int x
compiles to HeapRef<int> x
ref obj.Field
compiles to new HeapRef(obj, in obj.Field)
ref safeRef
compiles to ref safeRef.Get()
fixed (int* ptr = &safeRef)
compiles to fixed (int* ptr = &safeRef.Get())
As I know, current ref implementation is done this way, that target object is resolved through given address.
Only at gc time as required. It does not explicitly track the object, it points directly to the field within the object itself.
Actually I see no difference from @huoyaoyuan proposal of
HeapRef<>
. The problem that I see in this approach is not in performance, but in usability. You have to pass both object and field and also have to make sure that they are connected. And manual construction of a custom type when you just need a reference to a damnedint
looks even more sad.
The difference is that ref does not explicitly track the object. You cannot find the object from a ref yourself today - this is something only the gc can do. If we got GetFieldData
you would be able to easily implement the HeapRefHeapRef<T>
in general (you'd need to change the code all the way back when you got the ref itself). Whereas my InteriorRef<T>
explicitly supports all valid ref Ts due to its requirements for the GC.
As I said, you would have appropriate safe APIs / language features built on top of this low level API. If you never wanted to touch the low level api in your code, you shouldn't have to, but if you needed it then you'd be able to.
My approach supports all the things HeapRef<T>
does (via its SetWithObject
API), but also supports setting from a normal interior reference (ref T
). It also comes with added implementation complexity as a result (that is the GC needs to be able to update the fields in the way I mentioned).
Yeah, we need a C# support for
safe ref
atleast. It won't be as ugly if theseInteriorRef<>
,HeapRef<>
orSafeRef<>
won't be directly mentioned in user code.
I have no issue with a safe c# feature existing obviously, but having the low level types is important for some of us, and I for one would certainly have uses for them in my code.
Only at gc time as required. It does not explicitly track the object, it points directly to the field within the object itself.
Well, that's exactly when ref
object is required. You keep a safe ref int x
because you need an integer address, not an initial object. Keyword safe
is just needed to ensure that ref
will be alive till the moment you don't need it anymore. Even more, it would be right to ensure that user code won't be allowed to discover initial object.
The difference is that ref does not explicitly track the object. You cannot find the object from a ref yourself today - this is something only the gc can do.
Of course, ref
tracks objects implicitly. And that's exactly what is needed. All we need object for is to keep it alive, thus we don't care whether we can get object itself - all we need is that GC will keep it alive because we still have that safe ref
, which is treated as alive reference. And that's exactly why I noted in proposal that you can convert safe ref
to ref
(because ref
will still keep object alive as ref
can't escape stack), but you can't convert ref
to safe ref
as ref
is not guaranteed to refer an object, thus ref
is not guaranteed to be still valid on heap.
If we got GetFieldData you would be able to easily implement the HeapRef mentioned, but you would not be able to convert a ref T to a HeapRef
in general (you'd need to change the code all the way back when you got the ref itself). Whereas my InteriorRef explicitly supports all valid ref Ts due to its requirements for the GC.
Your InteriorRef<>
is unsafe because it just stores ref
as pointer. We can easily create instance of your InteriorRef<>
from a stack variable then return this struct, thus immediately rendering it's pointer address invalid, and your InteriorRef<>
won't be notified in any way. GC never cares about stack pops, only compiler do.
InteriorRef<int> Method() {
var x = 123;
var result = new InteriorRef<int>();
result.Set(ref x);
return result;
}
Your
InteriorRef<>
is unsafe because it just storesref
as pointer.
As I mentioned in the second block of comments - the GC would be expected to handle it specially for updating it and liveness analysis as to not cause issues by just storing it as a "normal pointer":
When a GC happens:
- If _interiorRef is not null - _interiorRef is set to null & it is converted to object + offset, or null + offset (for unmanaged pointer) when the GC happens
- Otherwise, we can just rely on looking at _object to see what object we need to keep alive
The above is even a simplified explanation of would most likely actually happen, as e.g., not all GCs need to look at all objects - but it should get the idea across of how it could work.
We can easily create instance of your InteriorRef<> from a stack variable then return this struct, thus immediately rendering it's pointer address invalid, and your InteriorRef<> won't be notified in any way. GC never cares about stack pops, only compiler do.
It's intended to be a low-level API that higher-level safe APIs can be built on, like I mentioned. This is like how Span<T>
and its extension methods is built on top of Unsafe
, MemoryMarshal
, unsafe
blocks, etc.
Keyword
safe
is just needed to ensure thatref
will be alive till the moment you don't need it anymore
Normal refs keep things alive too... The only difference with your "safe" ref is that its value must have a specific lifetime (that being - must refer to heap memory) - this has no impact on the viability of runtime/GC support for refs being stored on the heap, only for verifiability/safety (which I'm not saying is meaningless/unimportant to be clear) - unless I'm missing something or not understanding what you're saying, your safe ref
sounds like a purely language feature which needs to be built on top of a runtime/GC feature of allowing some form of interior references to be stored on the heap - I don't see any reason to disallow unmanaged/stack references for these from a runtime/GC POV (it's quite easy for the GC to check if a managed pointer may need to be updated & tracked to keep its target object alive - the more difficult part is figuring out precisely which object; see GetContainingObject
in gc.cpp
for example, it calls into is_in_find_object_range
which does 3 comparisons basically to figure out if it is definitely a non-managed-heap pointer very quickly (insert asterisks, but you get the idea)) and there are provably safe/valid uses of such things (e.g., giving a ref that happens to point into the stack, but that you know won't be used after the method exists (also it's not passed across threads), or pointing into native memory where you'd only deallocate it when it's not being used anymore, etc.), but it would be up to the programmer to verify that it is actually safe without stuff like delegate "lifetimes" or similar, which is why a language feature could say "you can only safely create references to the heap for this" but should not disallow unsafe code (by which I am referring to specifically, programmer-instead-of-compiler-verified-safety meaning of unsafe) to create non-managed-heap byrefs for this feature.
Only at gc time as required. It does not explicitly track the object, it points directly to the field within the object itself.
Well, that's exactly when
ref
object is required. You keep asafe ref int x
because you need an integer address, not an initial object.
The reason I clarified this is because you demonstrated you didn't know the difference:
Actually I see no difference from @huoyaoyuan proposal of HeapRef<>.
You have to pass both object and field and also have to make sure that they are connected.
To clarify again, the @huoyaoyuan's proposal requires you to know which object it comes from at construction time (by design), whereas mine doesn't require that you know this (but can still benefit from this info when available).
As I mentioned in the second block of comments - the GC would be expected to handle it specially for updating it and liveness analysis as to not cause issues by just storing it as a "normal pointer"
tldr: They are opposite in terms of lifetime - ref
is used to limit usage of reference to the target's scope, while safe ref
is used to extend usage of reference outside of any scope.
GC is not involved with stack pops at all. You return from method - GC is not even invoked. That's why all ref
are stack-bound - GC can't keep stack references alive because it will require GC to be performed on every return from any method.
In case of ref
, the ref
lifetime is limited to target's scope(ref
ensures that ref
lives no more than it's target), while in case of safe ref
target's lifetime is extended to safe ref
lifetime - it ensures that target object lives not less than safe ref
. ref
can't escape it's scope because it is not guaranteed that it's target exists outside of it's scope, while for safe ref
it is guaranteed that it's target is definitely not limited to any scope.
I understand that you want to have single container for both object and free references, but it is still unsafe till you solve given problem:
InteriorRef<nint> CreateRef(nint value) {
var x = value;
var result = new InteriorRef<nint>();
result.Set(ref x); // No error since we set a real reference.
return result; // No error since InteriorRef<> is not limited to current scope.
}
nint Increment(ref nint x) {
var oldValue = x;
x = x + 1;
return oldValue;
}
void Main() {
ref var x = ref CreateRef(1);
Increment(ref x);
Assert.IsTrue(x == 1); // Error! x is 2!
}
In given example when you return from CreateRef()
you get a reference to a stack variable that won't be cleared on return from CreateRef()
, because GC is not invoked on method return. It is too heavy to invoke GC cleanup for refs
on every return from method. When you enter in Increment()
variable oldValue
is allocated on the same place that x
was previously allocated in CreateRef()
, thus in Increment()
&x
is equal to &oldValue
.
That's exactly why your proposal is unsafe. If you just want a pointer that can keep object alive if pointed to object, your proposal then should be called unsafe ref
. Of course, there is a point in such tool as well, but it is pretty different from my proposal of safe ref
, and I don't think that they can be compared - they have different purposes.
Please read all of my response, but the main point is this:
There are a few reasons that interior refs are not allowed on the heap, including lifetime (which needs a language feature to be solved). But one you haven't addressed at all is mentioned in this article when spans initially came out:
These references are called interior pointers, and tracking them is a relatively expensive operation for the .NET runtime’s garbage collector. As such, the runtime constrains these refs to only live on the stack, as it provides an implicit low limit on the number of interior pointers that might be in existence.
My proposal attempts to solve this by making it so they stay as "interior pointers" as opposed to object + offset for no longer than needed. As I've already mentioned, the expensive part is not for interior references that point off of the managed heap, as these can be detected in 3 comparisons, it's the interior references that point into the managed heap (i.e., as you've proposed it, every safe ref
).
There are also atomicity concerns, e.g., consider if Span<T>
was allowed to be on the heap and 2 threads tried to write to it at once, you might end up with the byref from one and the length from another, which leads to an obviously incorrect span. Your safe ref
wouldn't fix this issue either, as it's not really possible to fix this except by safe API design, e.g., you could have a class (not a struct) that stores InteriorRef<T>
+ a length and disallows mutation for a HeapSpan<T>
- this would be safe since you can't get tearing due to it only ever being mutated by the creating thread and it's never mutated again.
tldr: They are opposite in terms of lifetime -
ref
is used to limit usage of reference to the target's scope, whilesafe ref
is used to extend usage of reference outside of any scope.
Ok, so you agree that the only difference with your safe ref
and normal ref
is the lifetime difference, and hence this doesn't address the runtime/GC limitations (which my proposal attempts to address).
GC is not involved with stack pops at all. You return from method - GC is not even invoked.
Ok...? GC involvement for any InteriorRef<T>
s would be something along the lines of: check if it's not converted to object + offset, if not: find the object it's in (which normal ref T
also does, and safe ref T
would also need to do) & convert to object + offset, treat like normal object reference. i.e., it's not meaningfully different to ref T
in the worst case, or object
once converted.
I understand that you want to have single container for both object and free references, but it is still unsafe till you solve given problem
What is there to solve? It's a low level/unsafe API. Yes, you can use it incorrectly, just like with any other low-level/unsafe API. If/when we get the ability to mark APIs unsafe
, it would be marked as such (which is why I put it in the System.Runtime.InteropServices
namespace, which has a bunch of other scary APIs - you wouldn't stick it straight in System
for example - it could also go in CompilerServices
or similar too though, that would be up to API review), along with the likes of Unsafe
, MemoryMarshal
, etc. You can use all of these APIs wrong and make everything crash and burn horribly - but you can still build safe APIs on top of them just fine.
Of course, there is a point in such tool as well, but it is pretty different from my proposal of
safe ref
, and I don't think that they can be compared - they have different purposes.
Your proposal is a language proposal. It doesn't seem to even address the runtime/GC limitations, which is what my proposal attempts to do. No one is arguing that there should not be a safe c# feature that exists, but what is it going to be built on? This is simply something that cannot work right now (unless you do @huoyaoyuan's version, which e.g., cannot support these operations that you mention ref -> safe ref // unsafe, pointer -> safe ref // unsafe
- they would be impossible to implement) due to runtime/GC limitations, and my proposal tries to address that with a potential solution to the problem of byrefs not being allowed on the heap.
Thank you for the detailed suggestion and motivation. I've written code that does something like this before, so I can see why it's desirable.
However, I strongly oppose exposing something like this in the BCL or in C# as a built-in, especially if it requires underlying runtime/GC changes. This is primarily for taste/maintainability reasons.
I think encouraging the use of long-lived interior references pointing at fields of GC objects is bad because it will promote tight coupling between distinct pieces of code, in a way that's hard to refactor and maintain.
Unsafe.XXX
, ref
, and out
are ideal for performance-sensitive scenarios where when used carefully they allow stripping away layers of abstraction, removing heap allocations, and removing bounds checks. But they need to be used thoughtfully and sparingly.
The main scenarios for this proposal I can think of are all asynchronous, either async
or a delegate that fills in a field later on. I have personal experience with both, where I implemented it using a target object + a FieldInfo
.
In these scenarios, the performance difference between a managed field reference and invoking a delegate or an interface property setter is basically insignificant compared to the time being spent running async machinery and performing the work in question to shuttle a result around. So I don't think the possible performance win justifies new functionality for this scenario, especially if it complicates the GC (which would introduce possible regressions in GC pause times).
To go in deeper on the coupling situation - most scenarios that call for this are probably better solved using an interface or getter/setter delegates. An interface provides a clean surface where you have a well-named interface and a well-named property, and the requirement to use a property instead of a field makes the code more maintainable because if your internal field layout changes, the property can become an adapter to support the new internal field layout. I've personally had to make changes of this sort many times in my career to keep code working, and if I were touching fields directly - or worse, taking long-lived references to them - that maintenance would become much harder. Performance for interfaces on the modern runtime can be quite good, and as mentioned above, the interface property access overhead is not likely to be the hotspot if you're running async code.
I am also mildly concerned about how this might interact with out
scenarios and readonly
fields. Both are not uncommon, so I can imagine seeing users try to apply this new functionality to both in the wild and cause fun and exciting new problems as a result.
Some real world scenarios for this proposal might be better solved by UnsafeAccessor
, as well, if the performance really does matter.
I would love to see a more "real" scenario justifying the need for this functionality if the people advocating for it have one to share, even in the abstract. It can be hard to reason about the advantages of something like this when just looking at a toy example.
Something narrowly scoped like 'roslyn can turn obj.FieldName
at compile time into new FieldReference(obj, fieldof(T, FieldName))
' could probably pass muster for me but you'd still have to get it into the compiler and BCL, and I'm still not sure I see the value of it even when the risk is reduced like that.
Background and motivation
Sometimes there are cases when you have multiple fields in object and want to store a reference to one of them for a long time, like in async code. Today you have to keep a reference to the object then decide required field in the target method.
Stack references won't survive first
await
, while object references are still alive. In following case first part is totally safe, but it won't compile becauseMethodAsync()
is not sure whetheroutput
would be alive till the method's end.API Proposal
To solve such problem we can implement
safe ref
type of reference, which is guaranteed to be alive till the end of usage.Conversions:
fixed
semantics: Same as withref
, we just pin given object that can be discovered through this address.API Usage
Alternative Designs
Risks
No response