S.R.CS.Unsafe: Add unsafe operations for ref returns and locals

dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

https://docs.microsoft.com/dotnet/core/

MIT License

15.16k stars 4.71k forks source link

S.R.CS.Unsafe: Add unsafe operations for ref returns and locals #17968

Closed jkotas closed 4 years ago

jkotas commented 8 years ago

Roslyn is adding support for ref returns and locals (https://github.com/dotnet/roslyn/issues/118). S.R.CS.Unsafe should provide operations that allow taking advantage of ref returns and locals in unsafe code.

public static class Unsafe
{
    // Reinterprets the given reference as a reference to a value of type TTo
    public static ref TTo As<TFrom,TTo>(ref TFrom source);

    // Add element offset to the given reference.
    public static ref T Add<T>(ref T source, int elementOffset);

    // Subtract element offset to the given reference.
    public static ref T Subtract<T>(ref T source, int elementOffset);

    // Determines whether the specified references point to the same location.
    public static bool AreSame<T>(ref T a, ref T b);
}

Edit: Updated with the revised proposal

jkotas commented 8 years ago

@mellinoe @nietras @mikedn @VSadov @KrzysFR @terrajobst

nietras commented 8 years ago

TL;DR

I propose changing the API surface to:

public static class Unsafe
{
    public static ref U As<T, U>(ref T source);

    public static ref T Add<T>(ref T source, int elementOffset);

    public static bool Equals<T>(ref T a, ref T b);
}

My initial thoughts are centered around two things:

Necessity
Naming

Necessity

As far as I can tell these new API additions are there for convenience only, since they can be expressed via the excisting unsafe API surface.

AsRef

ref int r = ref Unsafe.AsRef<byte, int>(ref b[0]);

can be expressed via:

ref int r = Unsafe.AsRef<int>(Unsafe.AsPointer(ref b[0]));

Note how the existing API does not need to specify source type byte.

RefAdd (move to index)

ref int r1 = ref Unsafe.RefAdd(ref a[0], 1);

can be expressed via:

ref int r = Unsafe.AsRef<int>(Unsafe.AsPointer(ref a[0]) + Unsafe.SizeOf<int>() * 1);

Clearly, RefAdd is shorter. Although, naming is somewhat unclear, see below.

RefEquals

Unsafe.RefEquals(ref a[0], ref a[0]);

can be expressed via:

Unsafe.AsPointer(ref a[0]) == Unsafe.AsPointer(ref a[0]);

My View

Personally, I think these additions are worth making as it makes a lot of scenarios easier and they will allow using these outside an unsafe context which I assume is a goal here. Although, I would suggest different naming...

Naming

What strock me first is that Ref is superfluous, with ref we are in a "type safe" context so there really is no need for Ref in the naming. That is, the API can be expressed instead simply as:

public static class Unsafe
{
    public static ref U As<T, U>(ref T source);
    public static ref T Add<T>(ref T source, int elementOffset); // Note I changed offset to elementOffset to make it clear
    public static bool Equals<T>(ref T a, ref T b);
}

This will allow writing the following:

ref int r = ref Unsafe.AsRef<byte, int>(ref b[0]);
ref int a1 = ref Unsafe.RefAdd(ref a[0], 1);
var sameAddress = Unsafe.RefEquals(ref a[0], ref a[0]);

instead as:

ref int r = ref Unsafe.As<byte, int>(ref b[0]);
ref int a1 = ref Unsafe.Add(ref a[0], 1);
var sameAddress = Unsafe.Equals(ref a[0], ref a[0]);

Since ref keywords are littered all over, Ref in the method names is redundant in my view. As @mikedn commented RefAdd/Add it is not particularly clear whether the offset is in bytes or elements. Other, names for Add, could be Index, At, Offset, AddOffset etc.

I think Add is probably best in terms of succintness and intend, but the parameter should be named explicitly to clearly indicate the offset is in elements.

mikedn commented 8 years ago

As far as I can tell these new API additions are there for convenience only, since they can be expressed via the excisting unsafe API surface.

Nope, they cannot. The equivalents that you show are incorrect because the use of AsPointer introduces an intermediary unmanaged pointer.

public static ref U As<T, U>(ref T source);

We already have a ref returning method - AsRef. It seems to me that it would make sense that the new method is also called AsRef.

public static bool Equals<T>(ref T a, ref T b);

Seems confusing. I can imagine one asking "Does this method compare references or does it compare the referenced values?".

mikedn commented 8 years ago

I think the biggest problem with this API is that AsRef requires to specify both the source and the destination types. Unfortunately we'll have to live with that as there's no way to avoid this given the current language possibilities.

That said, I wonder if it wouldn't be better to reverse the generic arguments:

    public static ref U AsRef<U, T>(ref T source);

in the hope that a future language version could do some sort of partial type inference so AsRef<int>(ref floatVar) is treated as AsRef<int, float>(ref floatVar).

nietras commented 8 years ago

Nope, they cannot. The equivalents that you show are incorrect because the use of AsPointer introduces an intermediary unmanaged pointer.

Ah yes, I made the incorrect assumption that things would be fixed, which they don't need to be. Nevermind the necessity argument then.

We already have a ref returning method - AsRef. It seems to me that it would make sense that the new method is also called AsRef.

For me, the Ref part of the existing AsRef implies a change in "reference type" i.e. from pointer to ref. Not that this is working on ref values. Since the AsRef<T,U> does not change from a pointer to a ref or similar it simply changes the type from T to U.

What if in the future generic pointers are allowed, but changing type of the pointer is not allowed via casting for example, would the following API addition make sense?

public static U* As<T, U>(T* source);

Would this then have to be called this AsPointer? To me this seems wrong. To me As<T,U> is in a closer relationship to As<T> than AsRef<T>.

Seems confusing. I can imagine one asking "Does this method compare references or does it compare the referenced values?"

Yes, I agree with that. Not sure RefEquals is the best name then, though. Why not use existing name found on object i.e. ReferenceEquals or would that also be confusing?

public static ref U AsRef<U, T>(ref T source);

This just seems counterintuitive to all other conventions in .NET. Perhaps, the compiler might as well infer it from the target e.g.

ref byte b = ref a[0];
ref int i = Unsafe.As(ref b); // Infer from assignment? Although it is almost too magical.

I suggested a fluent API for something like this when we first discussed the Unsafe API but the consensus was this would be bad from a perf perspective, since it introduces an intermediate "closure" type (although it should be JIT'ed away).

VSadov commented 8 years ago

The other two IL operations on refs that I have seen used are:

RefSubtract - atomic subtraction between two refs useful to get distance between refs to elements of the same array.
IsOnStack - used mostly in asserts

I do not claim that these are good names, but functionality could be useful.

nietras commented 8 years ago

@vsadov good suggestions. Out of interest, would you elaborate on what IsOnStack (or just OnStack) is used or could be used for?

mikedn commented 8 years ago

This just seems counterintuitive to all other conventions in .NET. Perhaps, the compiler might as well infer it from the target e.g.

Yeah, I suppose it doesn't make sense to do that in the hope that C# will ever do partial type inference like C++ does.

Why not use existing name found on object i.e. ReferenceEquals or would that also be confusing?

Hmm, I suppose it's fine. Whatever name suggests to the user that the references are compared and not the values :smile:.

IsOnStack - used mostly in asserts

To add to @nietras question: how would this be implemented?!

jkotas commented 8 years ago

Thank you for a great feedback! I like the suggestions.

IsOnStack - used mostly in asserts

Byrefs on stack are implicitly pinned, so it can be used to assert that it is safe to convert to raw pointer without pinning - example from CoreCLR. Unfortunately, there is no way to implement it in portable way. It would have to be runtime or platform specific that is not pretty given the current shape of library. If it keeps showing up as needed API, it should be looked into as separate issue.

RefSubtract - atomic subtraction between two refs useful to get distance between refs to elements of the same array

I agree that it is useful operation to have. BTW: It is less useful with the current byref locals and returns than one may think because of the single assignment limitations of byref locals. I have noticed in my experiments that one tends to operate on indices and then convert to byref as the last step and never go back - the style of the pointer math is different from unmanaged pointers.

It may be be also nice to have Subtract variant that takes elementOffset for convenience and symmetry with Add.

So the updated proposal is:

public static class Unsafe
{
    public static ref U As<T, U>(ref T source);
    public static ref T Add<T>(ref T source, int elementOffset);
    public static ref T Subtract<T>(ref T source, int elementOffset);
    public static int Subtract<T>(ref T a, ref T b);
    public static bool ReferenceEquals<T>(ref T a, ref T b);
}

More suggestions for refinements are welcomed.

Including @jamesqo @KrzysztofCwalina that I forgot to include yesterday.

jamesqo commented 8 years ago

public static int Subtract(ref T a, ref T b);

Since we're taking 64-bit platforms into account, the return type should probably be long instead.

public static ref U As<T, U>(ref T source);

This overload is bothering me a little. It doesn't read very smoothly in my brain; if I saw something like var y = Unsafe.As<Foo, Bar>(ref x) in code, I would go 'OK, so this is converting something to a Foo and... what's that second type parameter doing there?', when in fact it was converting to a ref Bar and the first type is what's being copied from. The fact that the existing As overload has only 1 parameter only makes things more confusing, for example if As<int> returns an int I would expect As<int, ...> to also return something related to an int.

I think we should instead name this ConvertRef, which reads better and makes it clearer we're dealing with refs. e.g.

Unsafe.ConvertRef<Foo, Bar>(ref x); // convert *from* ref Foo *to* ref Bar

It would also work smoothly if a future version of C# adds type inference based on return type assignment, e.g. as mentioned above

ref byte b = ref a[0];
ref int i = Unsafe.ConvertRef(ref b); // We converted a ref, and now we have a ref int
ref int i = Unsafe.As(ref b); // As... what? And why is the parameter passed as ref? :(

Your thoughts @nietras @mikedn?

jkotas commented 8 years ago

return type should probably be long instead

If this is used on array elements, the 32-bit return type is sufficient because of array indices can only be 32-bit integers in mainstream .NET runtimes. The problem only exists if somebody uses it for a general pointer math on unmanaged pointers cast to refs, or on .NET runtime variants that allow arrays with >2MB elements. A similar problem exists for Add as well. 32-bit offset argument is not ideal for general pointer math on 64-bit platforms. Changing the offsets to 64-bit type would make it lower performance on 32-bit platforms, and harder to work with. Another option is to make the offsets native int (IntPtr in C#), but it would again make it harder to work with because of one cannot do much with IntPtr in C# directly today.

I think the best way out is to document that these operations only work well on arrays with <2MB elements.

Unsafe.ConvertRef

Convert suggests significant change of representation in my mind, e.g. going to from bool to string. I agree with @nietras observation that it should called As because of it is just like the existing casting method. The need to specify extra generic argument is unfortunate, but there is not much that can be done about it without language change.

jamesqo commented 8 years ago

@jkotas Agree with your points on Subtract, it should probably be documented that it won't work well for pointers whose difference can't fit in an int. Users who really need to get a long difference can just convert both refs to byte*s and take the difference of that / Unsafe.SizeOf<T>.

I agree with @nietras observation that it should called As because of it is just like the existing casting method.

Unfortunate that we didn't name the existing overload Cast instead of As earlier, then we could have CastRef and CastPointer with no confusion between the AsRef / AsPointer methods. 😞

If we are to keep it as As, I would at least suggest switching the type parameters. I know @nietras said earlier it was 'counterintuitive to all other conventions in .NET', but every existing method in the Unsafe class takes the destination parameter before the source parameter. I think we should keep it that way for symmetry.

Also regarding the other Add and Subtract overloads, maybe they should accept uint instead of a regular int? This will prevent people from writing redundant code like Add(ref source, -6) or (God forbid) Subtract(ref source, -10). Or perhaps the other Subtract could be omitted entirely, since Add with int is enough to express up to 2 ^ 31 elements in both directions (barring Subtract(ref source, int.MinValue), but I don't think that's going to be a common case).

nietras commented 8 years ago

perhaps the other Subtract could be omitted entirely

I think ref T Subtract<T>(ref T source, int elementOffset) should be omitted too, it seems redundant. Instead we could consider adding a second overload for Add which takes IntPtr. This still leaves the Subtract(ref, ref) without a method suitable for 64-bit. I do not think that is the biggest problem with Subtract(ref,ref), though. Does it return element offset or byte offset? There is no way to tell from the method signature. I believe it should be analogous to Add and return element offset, will that be the case?

I agree overall that the common use case is probably gonna be with 32-bit offsets and that these will be signed integers.

ref TTo As<TFrom, TTo>(ref TFrom source) vs ref TTo Convert<TFrom, TTo>(ref TFrom source)

@jamesqo I do agree that the readability of As in this case is not ideal. Perhaps there is another way, though. I remember we had the same discussion for pointers initially, that is we discussed something like TTo* Cast<TFrom, TTo>(TFrom* source), but this was quickly resolved since implicit conversions from T* to void* is allowed.

Should this not be allowed for refs as well. Can ref T not be implicitly converted to ref void? As far as I understand C# allows implicit conversions that "loose" type information, that is from class T to object and from T* to void* etc. Would it then not make sense to allow ref T to ref void? That is it should be possible to write:

ref int i = ref a[3];
ref void v = ref i; // Implicit conversion OK

I do not know if there is any reason for ref void not being allowed at all currently (compile fails in C# with CS1536 "Invalid parameter type void", C++/CLI shows "a reference to void not allowed" intellisense warning), other than it makes sense since we cannot assign a value to it. However, with ref locals and returns would ref void not make sense? Of course, it cannot be used for anything as such other than to hold an address, but is that not a valid purpose?

If ref void would be allowed, we could define the conversion simply as:

ref T As<T>(ref void source)

And be able to write:

ref int i = ref a[3]
ref byte b = Unsafe.As<byte>(ref i);

I do understand this requires language design changes not only for C# but also for C++/CLI, VB.NET and F# perhaps, but I assume this is needed in some way for ref locals and returns anyway. So is allowing ref void a possibility? Are there any problems with this that I have not thought of?

Would it then make sense to allow something like:

void* p = ....;
ref void v = Unsafe.AsRef<void>(p);

mikedn commented 8 years ago

Would it then not make sense to allow ref T to ref void?

I think it would make sense. And you can actually create a ref void in IL but you won't be able to call the method from C# exactly because it can't convert from, say, ref int to ref void.

The funny thing about this is that if the language allows such a conversion then it should probably allow many other conversion between ref types and that may make this particular As overload mostly useless. Granted, such conversions would be explicit and that would require additional design and implementation work.

ref void v = Unsafe.AsRef(p);

That wouldn't work unless the language is changed to allow void to be used as type argument. I think there were some discussions about that but nothing happened.

nietras commented 8 years ago

allow void to be used as type argument

Yes, it is a problem in making C# more functional. void as a proper type is very useful e.g. return type.

may make this particular As overload mostly useless

Not sure this would be true, with pointers there is still the issue that generic casts are not allowed. I would assume this would be the same for possible ref conversions. That is, an explicit one like (ref byte)ref i where ref int i will work for example, but in a generic context you cannot write (ref T)ref i or the equivalent for pointers (T*)p. This is why Unsafe is so useful, we can circumvent the restrictions imposed by C# (restrictions that are there for a good reason, usually ;))

Although, this is mainly due to the restriction of C# not supporting generic pointers, if generic refs are supported and casting between them is allowed in generic code, then yes the As would be useless, I think.

mikedn commented 8 years ago

but in a generic context you cannot write (ref T)ref i or the equivalent for pointers (T*)p

I don't see any reason why (ref T)ref i would not work in a generic context. Unlike (T)i ref conversions are no-op so they don't suffer from the usual generic conversion problems. That said, such conversions are inherently unsafe and would probably require an unsafe context. Probably it's best to leave them out of the language because of that.

At least conversions to ref void are safe. You can't do anything with the resulting reference except converting it back to some ref X and that needs to be done via Unsafe.As.

jamesqo commented 8 years ago

@nietras @mikedn Maybe we could go back to the earlier suggestion using a fluent API? It could look something like this (translated to IL of course):

public static Interpreter<T> Interpret<T>(ref T source) => new Interpreter<T>(ref source);

public struct Interpreter<T>
{
    private IntPtr _ptr;

    public Interpreter(ref T source)
    {
        _ptr = (IntPtr)source;
    }

    public ref U As<U>() => (ref U)_ptr;
}

That way we could use it like

ref int i = Unsafe.Interpret(ref b).As<int>();

@nietras mentioned earlier this could be 'bad from a perf perspective', but since the JIT should basically eliminate these copies in Release mode I don't see why not (even if it's a little more IL to write). It's able to do partial type inference and looks much better than the other prototype which requires to to specify both types.

mikedn commented 8 years ago

Maybe we could go back to the earlier suggestion using a fluent API?

Ha ha, no thanks. I think I'll start hating fluent APIs with a passion. They have their uses but these days they're more like abuses.

jamesqo commented 8 years ago

@mikedn OK then, I'm going back to my earlier position about switching the type parameters. :)

Regarding the ref void / builtin ref cast discussion, I agree with you it's probably unlikely that those featues will be added anytime soon-- ref is used all the time for 'normal', non-unsafe code, e.g. Array.Resize and Monitor.Exit. Type safety would likely be more of a concern in that area.

jkotas commented 8 years ago

Unsafe.Interpret(ref b).As();

The implementation you have suggested would not even work. It has GC hole because of intermediate unmanaged pointer.

switching the type parameters

The prevalent order in the .NET APIs is "source, destination". The Copy methods on unsafe class are intentionally violating it to be in sync with the low-level order used in C and IL (discussion in dotnet/corefx#7966). Hard to come up with the "right" answer.

The different order between .NET and C is endless source of mistakes when one is using both. E.g. ~~@omariom~~ @GSPP just fell into this trap in https://github.com/dotnet/coreclr/issues/6541.

omariom commented 8 years ago

@KrzysztofCwalina btw, CoreFxLab's Span has Castmethod. It is basically As but for slices.

I think single name should be selected for both because it is just scalar vs sequence.

imo, As better expresses "the same location , different interpretation".

omariom commented 8 years ago

public static bool ReferenceEquals(ref T a, ref T b);

string s1 = "foo";
string s2 = s1;

Unsafe.ReferenceEquals(ref s1, ref s2);
// vs
object.ReferenceEquals(s1, s2);

It confuses me. refs are not references. May be better to keep RefEquals? It aligns well with AsRef.

GSPP commented 8 years ago

Regarding int elementOffset, I think IntPtr is appropriate to be forward compatible with future runtimes. Somewhere in the next 10 years we will likely need mainstream support for large arrays since big-memory scenarios are becoming gradually more common.

IntPtr is the the "correct" type anyway. For such low level code that's probably alright from a convenience standpoint (lowered convenience is OK).

I don't think there should be int overloads at all. That gets awkward because there is no overloading on return type.

I'd split Subtract into GetElementDifference and GetByteDifference. That clarifies the meaning and provides a new, useful method as well.

ReferenceEquals should not have plural Equals. It should be ReferencesEqual. I'd call it AreReferencesEqual.

As is a very unspecific name. It also collides with the C# keyword as which means something else entirely. I'd call it UncheckedCast or Cast or ConvertReference.

jamesqo commented 8 years ago

@jkotas

It has GC hole because of intermediate unmanaged pointer.

Ah, you seem to be right. I don't know if IL allows you to store a T& as a field...

The Copy methods on unsafe class are intentionally violating it to be in sync with the low-level order used in C and IL (discussion in dotnet/corefx#7966). Hard to come up with the "right" answer.

Actually, in the end I don't think the order of the type parameters really matter; when people start typing the method into VSCode / Visual Studio, they should see the name of the type parameter pop up (e.g. TSource, TDestination) which should prevent confusion. Even if they still slip up the compiler will catch it for them (unlike the methods w/ regular parameters), since you can't implicitly convert a ref byte, say, to a ref int. I think whatever decision is made should be the best one for readability, and As<TDestination, TSource> seems to be better in that regard (assuming we don't change it to Cast / CastRef).

@omariom

imo, As better expresses "the same location , different interpretation".

If someone wrote

enumerable.As<int>();

I would think that they were somehow converting an IEnumerable to an int, whereas if someone wrote

enumerable.Cast<int>();

I think I'd better understand each element of the enumerable was being cast, although maybe that's just because it's the API we have today.

It confuses me. refs are not references.

That's actually a good point; I agree too that calling it RefEquals may be a good idea.

May be better to keep RefEquals? It aligns well with AsRef.

AsRef was shelved earlier since there's an existing overload AsRef that converts from pointer-to-ref. I'm still kinda hoping that the new method can be named something like CastRef rather than As though... :confused:

jamesqo commented 8 years ago

@GSPP

I think IntPtr is appropriate to be forward compatible with future runtimes.

I'm not too sure about the idea of returning/accepting an IntPtr, first of all as @jkotas mentioned there's not a lot you can do with it (e.g. multiply, divide are missing), it seems to imply that it points to a valid memory location when in fact it's just a number, etc. Plus, the pointer size is not always guaranteed to be the same as the pointer difference size for a given platform; C for example differentiates between size_t, ptrdiff_t, intptr_t, etc. and there are real cases where they differ.

I'd split Subtract into GetElementDifference and GetByteDifference.

Redundant, you can just do this for the byte difference:

var byteDifference = Unsafe.Subtract(ref a, ref b) * Unsafe.SizeOf<T>();

Guaranteed to not overflow (I believe) since the maximum number of bytes 2 pointers can be apart is 2^63 / 64 or somewhere around there. I also think the redundant div (in Subtract)/mul (in SizeOf) should be eliminated by the JIT. If it's not, it should be.

Maybe if the method names were a little shorter then it would be OK... SubtractElements, SubtractBytes maybe? Seems kinda verbose though.

@nietras Regarding Subtract, even though it's redundant (I said so myself earlier) I still think it may be worth including. Unsafe.Subtract(ref a, 6) is more readable than Unsafe.Add(ref a, -6), and C# doesn't force you to write ptr + -6. Plus, we already have another overload of Subtract.

nietras commented 8 years ago

Lots of great input and food for thought. Naming is hard and my comment got pretty long again :|

As, Convert, Cast etc. naming

All these have existing meanings in .NET, none are without prior bagage.

As - already in use via as. A "referential" cast if possible or returns null. Only reference types are supported. Convert - used throughout .NET to indicate conversion from one type to another, in most cases (in my view) with copying of state e.g. TypeConverter.ConvertTo, BitConverter etc. Cast - or () used as both referential cast (e.g. (string)obj) and value casting (e.g. (int)float), in all cases a cast is checked and will fail if not appropriate.

I lack a proper terminology definition for .NET for talking about these in a consistent manner (does any exist?) so hope it is clear from context. Other possible wordings could be Reinterpret, Change and of course one can add suffix, prefix or other "fix"'es to these e.g. Ref, Reference, Unchecked all of which seem superfluous or redundant given the context Unsafe and how code would look as I previously mentioned, code will be littered with refs.

For the first version of Unsafe we chose As. In my opinion because As for Unsafe is closest in relationship to the as keyword, we unsafely reinterpret a pointer, object, ref as something else, and because of its brevity and readability, see below. All of these read naturally and are succinct, we are reinterpreting a value or object as a different type, we are not doing actual type conversions or value castings.

T As<T>(object o)
void* AsPointer<T>(ref T value)
ref T AsRef<T>(void* source)

That is why I still do not like Convert although it admittedly reads better i.e.

ref byte b = ...;
ref int i = Unsafe.As<byte, int>(ref b);
ref int i = Unsafe.Convert<byte, int>(ref b); //

Yes Convert reads more natural, but is it converting the value as well? Leading to add Ref to make it more clear although there are refs all over. So it gets long.

Cast has the same issues. @jamesqo even gives an example of it with enumerable.Cast<int>() which does value casting, not referential casting so cast has too much bagage for me in C#.

I am sure there are inconsistencies in my argumentation here :) However, for me I would much rather stick with the existing verb As which has a clearer meaning in my view, even though it is not perfect ;)

I would then define the method as (if ref void is not possible):

ref TTo As<TFrom, TTo>(ref TFrom source)

which leads to the other suggestions for type parameter names and order. @jamesqo suggested TSource, TDestination good suggestions but they are too long in my view, TFrom, TTo are better just due to brevity.

For the order of parameters I think one has to look to Func<> a type used all over .NET and used by most people. This has TResult last. This alone is reason enough for me to have it last for As as well, since I think it would be counterintuitive for new users. I do agree, though, that we definitely need the type parameters to be explicitly worded as T, U does not give enough meaning.

Wouldn't it be pertinent to ask the Roslyn team what there thoughts are on ref void? They surely have thought about this and it seems like a good addition.

ReferenceEquals, RefEquals, AreRefsEqual, AreEqual etc.

I agree ReferenceEquals is perhaps not the best choice anyway, as @omariom pointed out, "refs are not references" which is pretty to do the point, which is why I miss a proper defined terminology. I could live with AreEqual since it is short and with ref in the code and under Unsafe the usage should be clear, but a good alternative is:

bool AreRefsEqual(ref T a, ref T b)

Subtract, Offset, Difference, Distance, Index for ref to ref etc.

I think all "iterator" operations (can't help to feel that we are pretty much implementing C++ iterator behaviour for refs, so perhaps inspiration can be found there? std uses distance as indicative of number of elements between first and last) should return element offsets. If byte offset is needed use SizeOf.

I think Offset works better than Subtract, see next section.

Whether the offset should be IntPtr or not I am not sure, but it is definitely a problem that IntPtr does not support arithmetic on it. A sore point for C# in my view, there is no "native" integer type... an oversight in my view.

Subtract(ref, int)

I can live with adding this as well as Add, but this would give stronger support into not naming Subtract(ref, ref) well... Subtract but instead DistanceTo, Offset or similar. Currently, I prefer Offset as it is short, and we constantly keep saying element offset, when talking about add and subtract so should the offset between two refs not be found with the Offset method? Alternatively OffsetBetween.

GSPP commented 8 years ago

Maybe just add * and / to IntPtr? I think IntPtr should behave like any other integer type as much as possible. I saw proposed C# language changes about that on the Roslyn Github presence.

The pointer difference representation can safely be IntPtr on all platforms. The CLR can just promise to make that work. I see no issues with that.

benaadams commented 8 years ago

public static ref U AsRef<T, U>(ref T source);

Is same format used for Vector reinterpret

public static Vector<Byte> AsVectorByte<T>(Vector<T> value) where T : struct
public static Vector<Single> AsVectorSingle<T>(Vector<T> value) where T : struct

Maybe just add * and / to IntPtr? I think IntPtr should behave like any other integer type as much as possible.

https://github.com/dotnet/corefx/issues/10457 Operators should be exposed for System.IntPtr and System.UIntPtr

VSadov commented 8 years ago

@jkotas considering that stack always grows towards the heap and on the vast majority of current systems stack grows downwards, I think IsOnStack could be pretty portable.

IsOnStack could just take a ref of a dummy local and pointer-compare with the given ref. If the given ref points to a higher location, its referent cannot be on the heap.

We can ignore cases where the given ref points to a stack frame of a different thread or to a dead frame in the current stack, or to a kernel mode segment. Having such refs is a bug by itself and IsOnStack would not make much sense in those scenarios.

Note that with two dummy locals in two frames the direction of stack can be detected dynamically and the whole thing could be made insensitive to the "downwards" part and would only require that stack continuously grows towards the heap. However, I think it is an overkill and "downwards" is a safe assumption.

jkotas commented 8 years ago

considering that stack always grows towards the heap

Stacks of different threads can be interleaved with GC heap segments. It is actually pretty common to have this situation in large workloads (on Windows at least).

VSadov commented 8 years ago

Stacks of different threads can be interleaved with GC heap segment

I did not know about this. I always assumed that OS allocates stack segments in the higher addresses separately from heaps. I think I might have seen code in the past that relies on such assumptions. Interesting...

jkotas commented 8 years ago

Here is the updated proposal with feedback incorporated:

public static class Unsafe
{
    public static ref TTo As<TFrom,TTo>(ref TFrom source);

    public static ref T Add<T>(ref T source, int elementOffset);
    public static ref T Subtract<T>(ref T source, int elementOffset);

    public static bool AreSame<T>(ref T a, ref T b);
}

Keeping <TFrom,TTo> order for As as it is preferred by more people. Renaming the generic arguments for clarity.
int overloads for Add/Subtract are needed to make this reasonably usable today. IntPtr overloads for Add/Subtract can be added later without any harm if/once native int becomes better supported in C#.
Keeping Subtract(ref,int) for convenience, even though it is redundant.
Omitting Subtract(ref,ref) for now because of it is not very useful with byref locals and returns anyway, and there are naming and design issues around it.
Renaming ReferenceEquals to AreSame. I have checked about a good name with @KrzysztofCwalina and he suggested this name. It is being used for similar concepts in other places and I like it the most out of all the options discussed.

jamesqo commented 8 years ago

@nietras

Wouldn't it be pertinent to ask the Roslyn team what there thoughts are on ref void? They surely have thought about this and it seems like a good addition.

You can probably ask, but I am 99% sure the answer will be no; I don't think they're going to be very keen on adding further unsafe features to C#, e.g. they chose ref returns over generic pointers. Besides this contrived use case, what other uses could ref void possibly have in safe code? void* is only useful (mostly) since we don't have generic pointers.

ReferenceEquals, RefEquals, AreRefsEqual, AreEqual etc.

~~I think RefEquals may be best here. If Object has ReferenceEquals, we should have RefEquals. If Object had AreReferencesEqual, we should have AreRefsEqual.~~

Subtract, Offset, Difference, Distance, Index for ref to ref etc.

~~I think Difference is best. C# allows you to subtract a pointer from a pointer, as well as subtract an integer from a pointer. The name should be closely related to subtraction.~~

~~It would also avoid confusion, since the parameter order of other methods in the class if dest before src. Therefore, with a name like Offset~~

Was writing this post when I saw @jkotas' update.... :) Everything in the updated proposal looks good.

@jkotas I really like AreSame. Reminds me of xUnit's Assert.Same... so :+1: from me.

VSadov commented 8 years ago

To avoid specifying two type parameters in the As , it might be possible to use the Cast.From trick.

public class Unsafe { public static class As<TTo> { public static ref TTo From<TFrom>(ref TFrom source) { //magic } } . . . } int x = 1; ref uint y = ref Unsafe.As<uint>.From(ref x); //TFrom is inferred from the arg

jamesqo commented 8 years ago

@VSadov This appears to be giving a compiler error: http://tryroslyn.azurewebsites.net/#K4Zwlgdg5gBAygTxAFwKYFsDcAoADsAIwBswBjGUogQxBBgGEYBvbGNmfYsmANwHswAExgBBEAB4AKgD4AFHwIArVKWQwAZnz4BKZq3YBffW2MdCJcpRp0xU6aZbsYRg0A==

VSadov commented 8 years ago

@nietras 'ref' in C# does not apply to the type, it applies to the signature of the method. ref int Foo() is still considered as having a type of 'int', just instead of returning a value of some variable, it returns an alias of the variable itself. It is observable in type-specific scenarios such as overload resolution or type inference. If you have overloaded methods Test(int) and Test(char), you can do Test(Foo()) and the method that takes int will be called. That is because the return type of Foo is int and for all purposes it works as a method that returns an int. The part that it is 'ref' just makes it a variable/LValue, so you can do some extra stuff with it - like passing it by reference or assign to it.

In such prospective 'ref void M()' would not make much sense. Void method does not return anything. And 'ref void' does that by reference?

VSadov commented 8 years ago

@jamesqo Compiles for me: http://tryroslyn.azurewebsites.net/#b:master/AQ4YwGwQwZx4AKAnA9gcyVAtgKFMAbz31BgBcoyBLMYANxSoBNgBZKKgOwApyku0AbQC6wKEjQwAlMRKFZc0FzLAAHsAC8wAIwBuBYuBIApgDNgAV2XAAnpqNngAVU4wop4wDoAgjAA8VpxkAHyeAGKoWNwm5qpS+oYAvgrJCgAOFgBGEDTg0HDOru7GCkSGGdm55JS5kLDwvn4AKk0owQagZYagFTm01dS0McAtKMARKFjNE1jB0Y5NM8AwKBZIYMYy3Z0d2wD0e0wWWFh2VFhpEMZYxkE1KJy73SZka5wO5tycxgDuI62CbTCKSCAAMwgS22AySSKSAA==

class Program { static void Main(string[] args) { int x = 1; ref uint y = ref Unsafe.As<uint>.From(ref x); } } public class Unsafe { public static class As<TTo> { public static ref TTo From<TFrom>(ref TFrom source) { //dummy implementation return ref (new TTo[1])[0]; } } }

jamesqo commented 8 years ago

@VSadov There is already an existing method Unsafe.As<T> that converts from an object to a T.

VSadov commented 8 years ago

@jamesqo - I did not realize that Unsafe is an already existing class and has something in it. Anyways, it is just a suggestion. It could be implemented with a different name or not at all. The updated proposal seems good enough actually.

Not having to specify two types in 'As' would have mostly an aesthetical value. In reality you are still supplying two type arguments, just by splitting them between type/method you could let the compiler to infer the TFrom one from the argument.

jamesqo commented 8 years ago

@nietras Since it's possible to elide specifying both types if we have an inner class, do you still stick to your earlier position of using As?

nietras commented 8 years ago

AreSame

I really like this too, much better. MSTest uses it as well. In fact object.ReferenceEquals should have been named AreSame too. :+1:

do you still stick to your earlier position of using As?

@jamesqo good question. I did think a bit about it before, but didn't think there was precedence for doing something like that. I would specify it as:

public class Unsafe { public static class To<TTo> { public static ref TTo From<TFrom>(ref TFrom source) { ... } } }

The question then is, whether:

int x = 1; ref uint y = ref Unsafe.To<uint>.From(ref x);

is better than:

int x = 1; ref uint y = ref Unsafe.As<int, uint>(ref x);

? In this case, we don't really save much regarding typing. To/From does read better and has the benefit of not having to be explicit about TFrom type, which makes it more flexible. However, I also like that all reinterpretations start with As, since this makes discovery easier. I could live with both, but am not sure I prefer one from the other. :neutral_face:

Void method does not return anything. And ref void does that by reference?

@VSadov couldn't the same argument be made with void*? We return a void by pointer?

void* Foo(int*)

This seems contrary to the logic that void should mean nothing, in this case it rather means "typeless" while it should really mean of type void, as in F#. ref void is then just a typeless ref, not sure I follow completely from "ref int Foo() is still considered as having a type of int" is hard to grasp for me, is this just for overload resolution or other specific scenarios? Does this exclude ref void as an input parameter? I understand that return void is treated as having no return, but should ref void really be treated the same way? Isn't ref void closer to void* than void itself?

I would say ref void method returns a ref primarily but then with no type or type void. I probably do not understand all the issues here...

benaadams commented 8 years ago

int x = 1; ref uint y = ref Unsafe.To<uint>.From(ref x);

Doesn't suggest reinterpret "in-place", but transfer and change.

void* is a pointer to an unknown type. You can't do pointer arithmetic on it due to its unknown size, and must be cast to a type before dereferencing (c++).

void* ptr; void* ptr2 = ptr + 1; // nope void* ptr3 = (void*)((char*)ptr + 1); // ok void thing1 = *ptr; // nope auto thing2 = *ptr; // nope auto thing3 = *(char*)ptr; // ok

Is what you are asking the ability to cast a ref to a ptr? Which is already an operator in c# &

int x = 1; ref uint y = ref Unsafe.As<int, uint>(ref x); uint* pY = &y; // cast to pointer void* pV = (void*)&y; // cast to void pointer

nguerrera commented 8 years ago

I like this proposal. One small question: is there anything actually unsafe about AreSame<T>(ref T, ref T)? Should we consider putting it somewhere less scary?

benaadams commented 8 years ago

ValueType.ReferencesEqual<T>(ref T, ref T) :trollface:

nietras commented 8 years ago

Doesn't suggest reinterpret "in-place", but transfer and change.

That is true I guess it could be improved by calling it ToRef and perhaps even FromRef but then it gets even more verbose than As.

Is what you are asking the ability to cast a ref to a ptr?

No, and it wouldn't support the scenarios that ref can either since generic pointers are not supported. In addition, converting to pointers may cause a GC hole, since pointers are not "tracked" by the GC, but refs are as far as I understand.

anything actually unsafe about AreSame(ref T, ref T)

How would you write it using normal C# code? Although, that of course is not the same as it needing to be unsafe...

Omitting Subtract(ref,ref) for now because of it is not very useful with byref locals and returns anyway, and there are naming and design issues around it.

@jkotas Been thinking about this. Can´t ref be used for both managed and unmanaged memory? How is that handled in regards to the GC?

KrzysztofCwalina commented 8 years ago

I think the parameters to AreSame should be called "left" and "right". This is to mimic conventions we use for operator== overloads.

benaadams commented 8 years ago

How would you write it using normal C# code?

For ints

unsafe bool AreSame(ref int left, ref int right) { fixed (int* pLeft = &left) fixed (int* pRight = &right) { return pLeft == pRight; } }

Can't really do it for generic types?

Though it might be more a normal thing to test (than unsafe); like if you were given two structs from an array. might want to check if they are the same one.

omariom commented 8 years ago

Can refs be zero (null)?

jkotas commented 8 years ago

Can´t ref be used for both managed and unmanaged memory? How is that handled in regards to the GC?

Yes, refs can be used for both managed and unmanaged memory. The GC does not touch them if they point to unmanaged memory. Basically, the algorithm for refs during the GC root scanning is: if (does pointer point into GC heap) { track the pointer }. It is also the reason why it is problematic to have refs stored as fields of GC heap allocated object: the "does pointer points into GC heap" is expensive operation. Stacks are relatively small and so having them on stack-only is acceptable.

parameters to AreSame should be called "left" and "right"

Fixed.

Can refs be zero?

Yes.

omariom commented 8 years ago

In unsafe context only?

Would be interesting to have Unsafe.IsNullRef(ref valueRef);. Is it too much? )

update: May be it is just easier to use pointers then.

Next

© Githubissues.

Githubissues is a development platform for aggregating issues.