Open GrabYourPitchforks opened 4 years ago
ArrayPool<T>.Shared.Return
is unsafe too: Calling it more than once for given instance will lead to arbitrary state corruption.
ToEnumerable<T>(ReadOnlyMemory<T>)
and similar APIs are interesting case. If the memory was created from unmanaged source, the result is unsafe. Should it be considered unsafe by default since you cannot tell in general?
@jkotas Those are both interesting points. There are lots of APIs which fall over if you create them from a poisoned source. ToEnumerable
- as you had mentioned - is one example. But even Memory<T>.get_Span
can be considered dangerous if both of the following are true: (a) the instance is backed by a MemoryManager<T>
, and (b) the instance was torn somehow. Then the resulting span can point to garbage memory.
But since Memory<T>
is an abstraction it's nigh impossible to tell during a code review if the instance you have might be poisoned. So I was focusing more on the creation aspect: APIs which potentially allow creating a poisoned source are "unsafe", while APIs which deal with wrapping a Memory<T>
around a T[]
or string
or whatever are safe. Sure - they can still be torn - but they're guaranteed never to buffer overrun. This also helps narrow where code reviewers have to look.
Re: ArrayPool<T>.Shared.Return
double-return, would you consider this a type safety / memory safety violation, or would it be some other kind of violation? If I create a shared static byte[]
and multiple threads party on it concurrently, the app I wrote is certainly questionable. 🙂 But it'd still pass peverify and it's still type-safe in the typical definition. (I'm hand-waving away the fact that multithreading is already inherently unsafe, depending on who you ask.)
I'm also curious about IsNullRef
and NullRef
. From a strict ECMA-335 reading, they're in violation, but in practice they're totally fine. Curious as to whether others had thoughts on this.
but in practice they're totally fine.
reminds me Unsafe.NullRef<DateTime>().ToString()
leading to a runtime crash 😄
ArrayPool
.Shared.Return double-return, would you consider this a type safety / memory safety violation, or would it be some other kind of violation?
I think it is in the same category as Read<T>(ReadOnlySpan<byte>)
or AsRef<T>(in T)
. It won't corrupt or crash the core runtime. The incorrect use will introduce inconsistent state with very bad consequences for the program as a whole.
ArrayPool<T>.Shared.Return
is actually worse than the other APIs because of it will cause data corruptions in unrelated part of the program.
strict ECMA-335 reading
I would not read into verifiability rules in ECMA-335 too much. The rules have not been updated with evolution of the runtime. I think it may be better to explain all the reasoning here from the first principles.
ECMA-335 clearly states "Managed pointers cannot be null" (see §II.14.4.2). It does not take a very strict reading of the specification to see that they are, in theory, completely disallowed. This is likely one of the parts that can be changed when the team figures out how to rev the specification. Maybe this could be added to the ECMA-335 augments doc?
What standard is being used to make an API unsafe? As discussed, (7) does not allow to subvert the type system. It merely introduces state corruption which any number of things can do (for example, struct tearing). If the definition of "unsafe" is expanded to "anything that seems rather dangerous but might be correct" then the set grows considerably and it's often unclear what should be included. Another example would be OS handle access. That can cause "bugs at a distance".
it's often unclear what should be included
Oh, definitely. That's why I didn't include ArrayPool<T>.Shared.Return
in the original list. Sure, misuse can cause global state corruption in the application, but I can't categorize it as type system subversion.
FWIW, (7) absolutely does allow subverting the type system. Consider the following application.
public readonly struct MyStruct
{
private readonly sbyte _myByte;
public MyStruct(sbyte value)
{
if (value < 0) { throw new ArgumentOutOfRangeException(); }
_myByte = value;
}
public override string ToString()
{
if (_myByte < 0) { Environment.FailFast("What just happened?"); }
return _myByte.ToString();
}
}
public void MyMethod()
{
MyStruct[] arr = ArrayPool<MyStruct>.Shared.Rent(1024);
foreach (MyStruct value in arr)
{
Console.WriteLine(value); // this might Environment.FailFast
}
}
Structs of less than one machine word in size cannot be torn through standard multithreaded access. There's no way to get the _myByte field to be negative without having bypassed normal ctor validation. (This is true more generally: even for torn structs, it is never possible for any single field to have a "never legal" value without first having bypassed ctor validation to create a standalone instance.)
What standard is being used to make an API unsafe?
I think it should be the same standard that was used to mark API as SecurityCritical in Silverlight.
OK, I see what you mean. Essentially random bytes can be "blitted" over a struct. So you're saying, if a struct is no longer able to guard its state that counts as subverting the type system.
OK, I see what you mean. Essentially random bytes can be "blitted" over a struct. So you're saying, if a struct is no longer able to guard its state that counts as subverting the type system.
Would put reflection setters of private members also in that category?
Yes, any private reflection is in this category. This issue is specific to MemoryMarshal and friends. The platform as a whole has a lot more unsafe or partially unsafe APIs.
I think it should be the same standard that was used to mark API as SecurityCritical in Silverlight.
This could potentially be a bit too broad. I'm pretty sure ThreadPool.QUWI
was a critical function in Silverlight. And while I could sit here and argue that multi-threading is technically a vehicle for violating runtime invariants, you all would rightfully smack me silly for suggesting we mark QUWI as unsafe. 🙂
Agree. I meant just the reasons related to type and memory safety. The other reasons like CAS (where ThreadPool.QUWI
falls into) or I/O are not relevant here.
Since we're talking about MemoryMarshal
and ArrayPool
type safety, might help to create some definitions as part of the doc effort.
An unmanaged type is any value type which does not contain GC-tracked references. (See the C# unmanaged keyword and the API RuntimeHelpers.IsReferenceOrContainsReferences
). A non-exhaustive list of unmanaged types is: bool
, int
, nint
(IntPtr
), double
, decimal
, DateTime
, and Guid
.
A full-range unmanaged type is any value type where any arbitrary backing bit pattern is legal (verifiably type-safe) for an instance of that type. The set of full-range unmanaged types is a subset of the set of unmanaged types.
The difference is best demonstrated through examples.
int
is a full-range type. An int is 32 bits in size, and any one of the 2\^32 ways to fill these 32 bits is a legal int.
DateTime
is an unmanaged type but is not a full-range type. A DateTime is backed by a 64-bit integer, but the bit patterns have special meaning within the DateTime logic, and DateTime's constructor validates that the caller has not provided an invalid bit flag pattern.
Guid
is a full-range type. A Guid is 128 bits in size, and any one of the 2\^128 ways to fill these 128 bits results in a well-defined Guid.
double
is a full-range type. A double is 64 bits in size, and any one of the 2\^64 ways to fill these bits results in a well-defined double. Note: many of these values might resolve to special values like NaN or Infinity, but those values are still legal doubles, and all double operations are well-defined when consuming such values.
decimal
is an unmanaged type but is not a full-range type. The decimal constructor validates that each individual backing field is within a specific range and that the caller has not provided an out-of-range value.
bool
is an unmanaged type but is not a full-range type. A bool is 8 bits in size, but it is expected to have only the value 0
or 1
. A bool with a value 2 .. 255
will never naturally occur in the runtime, but if such a bool is constructed then it could cause unexpected branches to be taken within the app's logic.
This matters for two reasons. First, MemoryMarshal
can be used to convert arbitrary bit patterns to instances of unmanaged types. For full-range unmanaged types this is fine since all possible bit patterns are valid. But if MemoryMarshal
is used to convert an arbitary bit pattern to an instance of DateTime
, decimal
, bool
, or some other unmanaged type that isn't a full-range unmanaged type, it could result in a type safety violation. The resulting instance won't have gone through constructor validation, and if the backing pattern is not well-formed then consumers of this instance could experience undefined behaviors at runtime, including app crashes, infinite loops, or security bypasses. Consumers who use MemoryMarshal
as a glorified reinterpret_cast\<> operator should only ever use it to construct full-range unmanaged types, or they should perform validation upfront to ensure that the bit pattern is legal for the instance it's being copied into.
Second, the shared ArrayPool
is not guaranteed to zero-init the backing memory of arrays that it returns to callers. For example, this could result in ArrayPool<bool>.Shared.Rent
returning a bool[]
whose elements have values other than 1
or 0
. (It is not possible to construct such a bool[]
using standard safe C# code.) Consumers of ArrayPool
should never attempt to dereference array elements that they did not initialize themselves.
This is a bit nuanced, and I'm trying to figure out a good way to weave these concepts into the docs being created. Otherwise I think our guidance on how to use these APIs correctly will be lacking.
(Yes, I know you can tear structs to violate ctor invariants. But that's a rabbit hole I don't want to go down here.)
A bool with a value 2 .. 255 will never naturally occur in the runtime
AFAIK, this is totally legal in the CLR. C# requires unsafe code to construct such a bool (and indeed C# will malfunction when encountering such bools with operator &
). It is purely a C# rule, though.
I realize that with
Unsafe
,MemoryMarshal
, and friends entering wide use, we never formally stated what type safety guarantees (if any) the APIs offer. That is, we never provided a list of what APIs are "safe" and which are "unsafe equivalents" (related: https://github.com/dotnet/runtime/issues/31354).This issue is an attempt to enumerate the APIs on
Unsafe
,MemoryMarshal
, and related types from the perspective of type safety / memory safety. Ultimately I think this needs to be tracked somewhere, but whether that's a .md file in this repo or an official doc page I don't really know.I'm also soliciting feedback on this list. Please let me know if I got something wrong.
I'm not listing APIs which expose raw pointers through their public API surface. APIs which take or return pointers are always assumed unsafe.
System.Runtime.CompilerServices.Unsafe
Add<T>(ref T, int)
Add<T>(ref T, IntPtr)
AddByteOffset<T>(ref T, IntPtr)
AreSame<T>(ref T, ref T)
AsRef<T>(in T)
As<T>(object)
As<TFrom, TTo>(ref TFrom)
ByteOffset<T>(ref T, ref T)
CopyBlock(ref byte, ref byte, uint)
CopyBlockUnaligned(ref byte, ref byte, uint)
InitBlock(ref byte, ref byte, uint)
InitBlockUnaligned(ref byte, ref byte, uint)
IsAddressGreaterThan<T>(ref T, ref T)
IsAddressLessThan<T>(ref T, ref T)
IsNullRef<T>(ref T)
NullRef<T>()
ReadUnaligned<T>(ref byte)
SkipInit<T>(out T)
SizeOf<T>()
Subtract<T>(ref T, int)
Subtract<T>(ref T, IntPtr)
SubtractByteOffset<T>(ref T, IntPtr)
Unbox<T>(object)
WriteUnaligned<T>(ref byte)
System.Runtime.InteropServices.MemoryMarshal
AsBytes<T>(ReadOnlySpan<T>)
AsBytes<T>(Span<T>)
AsMemory<T>(ReadOnlyMemory<T>)
AsRef<T>(ReadOnlySpan<byte>)
AsRef<T>(Span<byte>)
Cast<TFrom, TTo>(ReadOnlySpan<T>)
Cast<TFrom, TTo>(Span<T>)
CreateFromPinnedArray<T>(T[], int, int)
CreateReadOnlySpan<T>(ref T, int)
CreateSpan<T>(ref T, int)
GetArrayDataReference<T>(T[])
GetReference<T>(ReadOnlySpan<T>)
GetReference<T>(Span<T>)
Read<T>(ReadOnlySpan<byte>)
ToEnumerable<T>(ReadOnlyMemory<T>)
TryGetArray<T>(ReadOnlyMemory<T>, ...)
TryGetMemoryManager<T>(ReadOnlyMemory<T>, ...)
TryGetString<T>(ReadOnlyMemory<char>, ...)
TryRead<T>(ReadOnlySpan<byte>, out T)
TryWrite<T>(Span<byte>, ref T)
Write<T>(Span<byte>, ref T)
System.Runtime.InteropServices.SequenceMarshal
TryGetArray<T>(...)
TryGetReadOnlyMemory<T>(...)
TryGetReadOnlySequenceSegment<T>(...)
TryRead<T>(...)
System.Runtime.InteropServices.CollectionsMarshal
AsSpan<T>(List<T>)
System.GC
AllocateArray<T>(int, bool)
AllocateUninitializedArray<T>(int, bool)
GetPinnableReference
patternThough
GetPinnableReference
methods are intended for compiler use within fixed blocks, they're designed to be type-safe when called by hand.string.GetPinnableReference()
ReadOnlySpan<T>.GetPinnableReference()
Span<T>.GetPinnableReference()
Miscellaneous
ArrayPool<T>.Shared.Rent(int)
MemoryPool<T>.Shared.Rent(int)
Notes
In the below notes, I'm using the terms gcref and managed pointer interchangeably.
(1) Arithmetic operations on gcrefs (such as via
Unsafe.Add
) are not checked for correctness by the runtime. The resulting gcref may point to invalid memory or to a different object. See ECMA-335, Sec. III.1.5.(2) It is legal and type-safe to perform comparisons against gcrefs. See ECMA-225, Sec. III.1.5 and Table III.4.
(3) Stripping the "readonly"-ness of a gcref is analogous to using C++'s
const_cast
operator. It could allow mutation of a value that the caller did not intend to make mutable.(4) The runtime will not validate that casts performed by these APIs are correct. This is equivalent to C++'s
reinterpret_cast
operator. Improper casts could result in buffer overruns when accessing the backing value or in incorrect entry points being invoked when calling instance methods.(5) While it is legal to calculate the absolute offset between two gcrefs, it is unverifiable to do so. See ECMA-335, Sec. III.1.5 and Table III.2.
(6) The runtime does not validate the buffer lengths provided to these APIs. Improper usage could result in buffer overruns.
(7) Use of this API could expose uninitialized memory to the caller. See ECMA-335, Sec. II.15.4.1.3 and Sec. III.1.8.1.1. If the uninitialized memory is projected as a non-primitive struct, the instance's backing fields could contain data which violates invariants that would normally be guaranteed by the instance's ctor.
(8) The
sizeof
CIL instruction is always safe. See ECMA-335, Sec. III.4.25.(9) The
unbox
CIL instruction is intended to return a controlled-mutability managed pointer. However,Unsafe.Unbox
returns a fully mutable gcref. This could allow mutation of a boxed readonly struct, which is illegal. See theUnsafe.Unbox
docs for more information.(10) Per ECMA-335, Sec. II.14.4.2, it is not strictly legal for a gcref to point to null. However, all .NET runtimes allow this and treat it in a type-safe fashion, including guarding accesses to null gcrefs by throwing
NullReferenceException
as appropriate.(11) This method performs the equivalent of a C++-style
reinterpret_cast
. This bypasses normal constructor validation, potentially returning values with inconsistent internal state. Projecting unmanaged types as byte buffers may also expose or allow modification of private fields that the type author did not intend, an unsafe reflection equivalent.(12) The runtime does not perform alignment checks. The caller is responsible for ensuring that any returned refs or spans are properly aligned. Most APIs that accept refs or spans as parameters assume that the references are properly aligned, and they may exhibit undefined behavior if this assumption is violated.
(13) This method handles unaligned data accesses correctly.
(14) This method is safe if TFrom and TTo are integral primitives of the same width. For example, TFrom =
int
with TTo =uint
is safe. Integral primitives are:byte
,sbyte
,short
,ushort
,int
,uint
,long
,ulong
,nint
,nuint
, and enums backed by any of these. The caller is responsible for providing a correct TFrom and TTo; the runtime will not validate these type parameters.(15) The runtime will not validate that the array is pre-pinned. Additionally, since
Memory<T>
instances are subject to struct tearing, any instances backed by pre-pinned arrays must be used with caution in multithreading scenarios, as callingMemory<T>.Pin
on a torn instance backed by a pre-pinned array may result in an access violation.(16) If called against a zero-length array or buffer, returns a gcref to where the value at index 0 would have been. It is legal to use such a gcref for comparison purposes (see, e.g.,
Unsafe.IsAddressLessThan
), and the gcref will be properly GC-tracked. However, it is illegal to dereference such a gcref. See ECMA-335, Sec. III.1.1.5.2.(17)
Memory<T>
's implementation is currently backed by one of:T[]
,string
, orMemoryManager<T>
. However, sinceMemory<T>
is an abstraction, new backing mechanisms may be introduced in the future. Callers must account for the runtime allowing all ofTryGet{Array|MemoryManager|String}
to return false; and callers must have a fallback code path in order to remain future-proof.(18) This API may expose the larger buffer beyond the slice bounded by the
Memory<T>
instance. Callers should take care not to reference data beyond the slice provided to them.(19) This API will never return a null reference. If called on an empty
string
, it will return a reference to the null terminator. The return value can always be safely dereferenced.(20) This API will return a null reference if the underlying span contains no elements. Attempting to dereference it will result in a normal
NullReferenceException
being thrown. Note also that unlike pinningstring
instances, the buffer resulting from pinning aReadOnlySpan<char>
orSpan<char>
reference is not guaranteed to be null-terminated. Consumers must not attempt to read off the end of such buffers.(21) Improper use of this API could corrupt the state of the associated object. However, it would not be considered a type safety or memory safety violation.
(22) The runtime will not validate that writes to the ref will satisfy covariant type safety constraints. For example, a local of type
ref readonly object
may actually point to a field typed asstring
. Removing the readonly constraint and treating it as a mutableref object
may allow assignment of a non-string
to the backingstring
field.