dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.75k forks source link

Enumerating type safety guarantees in MemoryMarshal and friends #41418

Open GrabYourPitchforks opened 4 years ago

GrabYourPitchforks commented 4 years ago

I realize that with Unsafe, MemoryMarshal, and friends entering wide use, we never formally stated what type safety guarantees (if any) the APIs offer. That is, we never provided a list of what APIs are "safe" and which are "unsafe equivalents" (related: https://github.com/dotnet/runtime/issues/31354).

This issue is an attempt to enumerate the APIs on Unsafe, MemoryMarshal, and related types from the perspective of type safety / memory safety. Ultimately I think this needs to be tracked somewhere, but whether that's a .md file in this repo or an official doc page I don't really know.

I'm also soliciting feedback on this list. Please let me know if I got something wrong.

I'm not listing APIs which expose raw pointers through their public API surface. APIs which take or return pointers are always assumed unsafe.

Reminder: "Unsafe equivalent" does not mean "must not be used." Rather, it means that the API can be used to violate a type's contract or will bypass the runtime's normal type safety checks. Think of these APIs as being equivalent to using the unsafe keyword within your code. If you are acting as a code reviewer, give extra scrutiny to calls to these APIs, as you would give extra scrutiny to any code involving pointers or other unsafe constructs.

System.Runtime.CompilerServices.Unsafe

API safe or unsafe See notes
Add<T>(ref T, int) unsafe equivalent (1)
Add<T>(ref T, IntPtr) unsafe equivalent (1)
AddByteOffset<T>(ref T, IntPtr) unsafe equivalent (1)
AreSame<T>(ref T, ref T) safe (2)
AsRef<T>(in T) unsafe equivalent (3), (22)
As<T>(object) unsafe equivalent (4)
As<TFrom, TTo>(ref TFrom) unsafe equivalent (4)
ByteOffset<T>(ref T, ref T) unsafe equivalent (5)
CopyBlock(ref byte, ref byte, uint) unsafe equivalent (6)
CopyBlockUnaligned(ref byte, ref byte, uint) unsafe equivalent (6)
InitBlock(ref byte, ref byte, uint) unsafe equivalent (6)
InitBlockUnaligned(ref byte, ref byte, uint) unsafe equivalent (6)
IsAddressGreaterThan<T>(ref T, ref T) safe (2)
IsAddressLessThan<T>(ref T, ref T) safe (2)
IsNullRef<T>(ref T) safe (2), (10)
NullRef<T>() safe (2), (10)
ReadUnaligned<T>(ref byte) unsafe equivalent (4)
SkipInit<T>(out T) unsafe equivalent (7)
SizeOf<T>() safe (8)
Subtract<T>(ref T, int) unsafe equivalent (1)
Subtract<T>(ref T, IntPtr) unsafe equivalent (1)
SubtractByteOffset<T>(ref T, IntPtr) unsafe equivalent (1)
Unbox<T>(object) unsafe equivalent (9)
WriteUnaligned<T>(ref byte) unsafe equivalent (4)

System.Runtime.InteropServices.MemoryMarshal

API safe or unsafe See notes
AsBytes<T>(ReadOnlySpan<T>) unsafe equivalent (11)
AsBytes<T>(Span<T>) unsafe equivalent (11)
AsMemory<T>(ReadOnlyMemory<T>) unsafe equivalent (3)
AsRef<T>(ReadOnlySpan<byte>) unsafe equivalent (11), (12)
AsRef<T>(Span<byte>) unsafe equivalent (11), (12)
Cast<TFrom, TTo>(ReadOnlySpan<T>) unsafe equivalent (11), (12), (14)
Cast<TFrom, TTo>(Span<T>) unsafe equivalent (11), (12), (14)
CreateFromPinnedArray<T>(T[], int, int) unsafe equivalent (15)
CreateReadOnlySpan<T>(ref T, int) unsafe equivalent (6)
CreateSpan<T>(ref T, int) unsafe equivalent (6)
GetArrayDataReference<T>(T[]) unsafe equivalent (16)
GetReference<T>(ReadOnlySpan<T>) unsafe equivalent (3), (16), (22)
GetReference<T>(Span<T>) unsafe equivalent (16)
Read<T>(ReadOnlySpan<byte>) unsafe equivalent (11), (13)
ToEnumerable<T>(ReadOnlyMemory<T>) safe
TryGetArray<T>(ReadOnlyMemory<T>, ...) unsafe equivalent (3), (17), (18)
TryGetMemoryManager<T>(ReadOnlyMemory<T>, ...) unsafe equivalent (3), (17), (18)
TryGetString<T>(ReadOnlyMemory<char>, ...) safe (17), (18)
TryRead<T>(ReadOnlySpan<byte>, out T) unsafe equivalent (11), (13)
TryWrite<T>(Span<byte>, ref T) unsafe equivalent (11), (13)
Write<T>(Span<byte>, ref T) unsafe equivalent (11), (13)

System.Runtime.InteropServices.SequenceMarshal

API safe or unsafe See notes
TryGetArray<T>(...) unsafe equivalent (3), (17), (18)
TryGetReadOnlyMemory<T>(...) safe
TryGetReadOnlySequenceSegment<T>(...) safe
TryRead<T>(...) unsafe equivalent (11), (13)

System.Runtime.InteropServices.CollectionsMarshal

API safe or unsafe See notes
AsSpan<T>(List<T>) safe (21)

System.GC

API safe or unsafe See notes
AllocateArray<T>(int, bool) safe
AllocateUninitializedArray<T>(int, bool) unsafe equivalent (7)

GetPinnableReference pattern

Though GetPinnableReference methods are intended for compiler use within fixed blocks, they're designed to be type-safe when called by hand.

API safe or unsafe See notes
string.GetPinnableReference() safe (19)
ReadOnlySpan<T>.GetPinnableReference() safe (20)
Span<T>.GetPinnableReference() safe (20)

Miscellaneous

API safe or unsafe See notes
ArrayPool<T>.Shared.Rent(int) unsafe equivalent (7)
MemoryPool<T>.Shared.Rent(int) unsafe equivalent (7)

Notes

In the below notes, I'm using the terms gcref and managed pointer interchangeably.

jkotas commented 4 years ago
GrabYourPitchforks commented 4 years ago

@jkotas Those are both interesting points. There are lots of APIs which fall over if you create them from a poisoned source. ToEnumerable - as you had mentioned - is one example. But even Memory<T>.get_Span can be considered dangerous if both of the following are true: (a) the instance is backed by a MemoryManager<T>, and (b) the instance was torn somehow. Then the resulting span can point to garbage memory.

But since Memory<T> is an abstraction it's nigh impossible to tell during a code review if the instance you have might be poisoned. So I was focusing more on the creation aspect: APIs which potentially allow creating a poisoned source are "unsafe", while APIs which deal with wrapping a Memory<T> around a T[] or string or whatever are safe. Sure - they can still be torn - but they're guaranteed never to buffer overrun. This also helps narrow where code reviewers have to look.

Re: ArrayPool<T>.Shared.Return double-return, would you consider this a type safety / memory safety violation, or would it be some other kind of violation? If I create a shared static byte[] and multiple threads party on it concurrently, the app I wrote is certainly questionable. 🙂 But it'd still pass peverify and it's still type-safe in the typical definition. (I'm hand-waving away the fact that multithreading is already inherently unsafe, depending on who you ask.)

I'm also curious about IsNullRef and NullRef. From a strict ECMA-335 reading, they're in violation, but in practice they're totally fine. Curious as to whether others had thoughts on this.

EgorBo commented 4 years ago

but in practice they're totally fine.

reminds me Unsafe.NullRef<DateTime>().ToString() leading to a runtime crash 😄

jkotas commented 4 years ago

ArrayPool.Shared.Return double-return, would you consider this a type safety / memory safety violation, or would it be some other kind of violation?

I think it is in the same category as Read<T>(ReadOnlySpan<byte>) or AsRef<T>(in T). It won't corrupt or crash the core runtime. The incorrect use will introduce inconsistent state with very bad consequences for the program as a whole.

ArrayPool<T>.Shared.Return is actually worse than the other APIs because of it will cause data corruptions in unrelated part of the program.

strict ECMA-335 reading

I would not read into verifiability rules in ECMA-335 too much. The rules have not been updated with evolution of the runtime. I think it may be better to explain all the reasoning here from the first principles.

reflectronic commented 4 years ago

ECMA-335 clearly states "Managed pointers cannot be null" (see §II.14.4.2). It does not take a very strict reading of the specification to see that they are, in theory, completely disallowed. This is likely one of the parts that can be changed when the team figures out how to rev the specification. Maybe this could be added to the ECMA-335 augments doc?

GSPP commented 4 years ago

What standard is being used to make an API unsafe? As discussed, (7) does not allow to subvert the type system. It merely introduces state corruption which any number of things can do (for example, struct tearing). If the definition of "unsafe" is expanded to "anything that seems rather dangerous but might be correct" then the set grows considerably and it's often unclear what should be included. Another example would be OS handle access. That can cause "bugs at a distance".

GrabYourPitchforks commented 4 years ago

it's often unclear what should be included

Oh, definitely. That's why I didn't include ArrayPool<T>.Shared.Return in the original list. Sure, misuse can cause global state corruption in the application, but I can't categorize it as type system subversion.

FWIW, (7) absolutely does allow subverting the type system. Consider the following application.

public readonly struct MyStruct
{
    private readonly sbyte _myByte;
    public MyStruct(sbyte value)
    {
        if (value < 0) { throw new ArgumentOutOfRangeException(); }
        _myByte = value;
    }

    public override string ToString()
    {
        if (_myByte < 0) { Environment.FailFast("What just happened?"); }
        return _myByte.ToString();
    }
}

public void MyMethod()
{
    MyStruct[] arr = ArrayPool<MyStruct>.Shared.Rent(1024);
    foreach (MyStruct value in arr)
    {
        Console.WriteLine(value); // this might Environment.FailFast
    }
}

Structs of less than one machine word in size cannot be torn through standard multithreaded access. There's no way to get the _myByte field to be negative without having bypassed normal ctor validation. (This is true more generally: even for torn structs, it is never possible for any single field to have a "never legal" value without first having bypassed ctor validation to create a standalone instance.)

jkotas commented 4 years ago

What standard is being used to make an API unsafe?

I think it should be the same standard that was used to mark API as SecurityCritical in Silverlight.

GSPP commented 4 years ago

OK, I see what you mean. Essentially random bytes can be "blitted" over a struct. So you're saying, if a struct is no longer able to guard its state that counts as subverting the type system.

benaadams commented 4 years ago

OK, I see what you mean. Essentially random bytes can be "blitted" over a struct. So you're saying, if a struct is no longer able to guard its state that counts as subverting the type system.

Would put reflection setters of private members also in that category?

jkotas commented 4 years ago

Yes, any private reflection is in this category. This issue is specific to MemoryMarshal and friends. The platform as a whole has a lot more unsafe or partially unsafe APIs.

GrabYourPitchforks commented 4 years ago

I think it should be the same standard that was used to mark API as SecurityCritical in Silverlight.

This could potentially be a bit too broad. I'm pretty sure ThreadPool.QUWI was a critical function in Silverlight. And while I could sit here and argue that multi-threading is technically a vehicle for violating runtime invariants, you all would rightfully smack me silly for suggesting we mark QUWI as unsafe. 🙂

jkotas commented 4 years ago

Agree. I meant just the reasons related to type and memory safety. The other reasons like CAS (where ThreadPool.QUWI falls into) or I/O are not relevant here.

GrabYourPitchforks commented 3 years ago

Since we're talking about MemoryMarshal and ArrayPool type safety, might help to create some definitions as part of the doc effort.

An unmanaged type is any value type which does not contain GC-tracked references. (See the C# unmanaged keyword and the API RuntimeHelpers.IsReferenceOrContainsReferences). A non-exhaustive list of unmanaged types is: bool, int, nint (IntPtr), double, decimal, DateTime, and Guid.

A full-range unmanaged type is any value type where any arbitrary backing bit pattern is legal (verifiably type-safe) for an instance of that type. The set of full-range unmanaged types is a subset of the set of unmanaged types.

The difference is best demonstrated through examples.

This matters for two reasons. First, MemoryMarshal can be used to convert arbitrary bit patterns to instances of unmanaged types. For full-range unmanaged types this is fine since all possible bit patterns are valid. But if MemoryMarshal is used to convert an arbitary bit pattern to an instance of DateTime, decimal, bool, or some other unmanaged type that isn't a full-range unmanaged type, it could result in a type safety violation. The resulting instance won't have gone through constructor validation, and if the backing pattern is not well-formed then consumers of this instance could experience undefined behaviors at runtime, including app crashes, infinite loops, or security bypasses. Consumers who use MemoryMarshal as a glorified reinterpret_cast\<> operator should only ever use it to construct full-range unmanaged types, or they should perform validation upfront to ensure that the bit pattern is legal for the instance it's being copied into.

Second, the shared ArrayPool is not guaranteed to zero-init the backing memory of arrays that it returns to callers. For example, this could result in ArrayPool<bool>.Shared.Rent returning a bool[] whose elements have values other than 1 or 0. (It is not possible to construct such a bool[] using standard safe C# code.) Consumers of ArrayPool should never attempt to dereference array elements that they did not initialize themselves.

This is a bit nuanced, and I'm trying to figure out a good way to weave these concepts into the docs being created. Otherwise I think our guidance on how to use these APIs correctly will be lacking.

(Yes, I know you can tear structs to violate ctor invariants. But that's a rabbit hole I don't want to go down here.)

GSPP commented 3 years ago

A bool with a value 2 .. 255 will never naturally occur in the runtime

AFAIK, this is totally legal in the CLR. C# requires unsafe code to construct such a bool (and indeed C# will malfunction when encountering such bools with operator &). It is purely a C# rule, though.