[API Proposal]: Safe nullability for `ref struct`s

redgoldlace commented 3 months ago

Background and motivation

Prior to C# 13, the lack of the allows ref struct generic bound and the inability to use ref structs as generic type arguments made it impossible to represent a nullable ref struct.

This has a particular impact on the newer Span and ReadOnlySpan APIs, especially in the context of writing parsing code, where representing the lack of a value is important, and negatively harms the ergonomics and expressiveness of these APIs, making them more difficult to adopt when writing high-performance code.

As well as making it impossible to represent nullable Spans and ReadOnlySpans directly, this also harms the ergonomics of types that contain Spans, ReadOnlySpans or other ref struct-like types, by indirectly making it impossible to represent a lack of a value. While - in the real world - you would want to simply use a Range instead, this problem is extremely apparent in the case of a hypothetical RegexMatch struct containing a ReadOnlySpan - you cannot have a RegexMatch-returning method that returns null if no match is found.

Ultimately, the arguments for nullable ref structs are the same as the arguments for nullable structs and nullable reference types in the first place; representing a possible lack of a value without using a type-specific "sentinel" value, providing type-safety to downstream code, and catching bugs at compile time rather than at runtime. Since C# supports nullable values in all of these scenarios, a lack of support for ref structs feels like a hole that should be addressed.

As mentioned above, this was broadly impossible to support prior to C# 13. With C# 13 supporting the allows ref struct generic bound, there should no longer be anything at the language/runtime level blocking the implementation of this feature.

API Proposal

From what I can see, there are effectively two options here.

Option 1: Annotate System.Nullable<T> such that T : allows ref struct. From everything I understand of the implementation of allows ref struct, this would be a breaking change, as it would make all instantiations of System.Nullable<T> behave according to the restrictions imposed on ref structs, regardless of whether a specific T was/was not a ref struct. For this reason, it's likely that this is not a practical implementation choice.

Option 2: Implement a new System.Nullable<T>-like type for ref structs. The API for this would be extremely similar to the existing System.Nullable<T> API, and could look vaguely similar to the below:

namespace System;

public ref struct NullableRef<T> where T : struct, allows ref struct
{
    public readonly bool HasValue => /* .... */;    
    public readonly T Value => /* .... */;

    public static implicit operator T?(T value) => /* .... */;
    public static explicit operator T(T? value) => /* .... */;

    // ... along with the other functionality that `System.Nullable<T>` supports
}

The type T? could then be expanded to this hypothetical System.NullableRef<T> class in the case of a ref struct, and System.Nullable<T> otherwise. Since using a ref struct as a generic type argument requires an explicit allows ref struct bound, this should not affect existing code that deals with nullable generic types.

For usage in pattern matching, it's likely that this would require runtime/language support of some kind, but I'm not entirely sure where. With the API surface and general implementation being so similar, this doesn't seem like it would be a particularly difficult change, and rather just an expansion of existing functionality. That said, looks can be deceiving, so it's possible this is more involved than I realize!

API Usage

// Consume a nullable `ref struct`, such as a nullable `ReadOnlySpan<char>`
if (myParser.maybeReturnsSpan(input) is not {} result)
{
    // Complain about the lack of a value
    throw new SpecificException("Invalid file header");
}

// ... and then continue processing `result`
// Like `System.Nullable<T>`, pattern matching can narrow from `ReadOnlySpan<char>?` to `ReadOnlySpan<char>`
var something = cantBeNullable(result);

// Explicitly check if a value is `null`
if (something is null)
{
    // ...
}

Most of the examples here would apply to System.Nullable<T> as well - the idea is for things to be as ergonomic as possible, and mirror System.Nullable<T> where possible.

Alternative Designs

See the first option mentioned in the API design section above. This is the most obvious alternative, though the actual implementation of a hypothetical System.NullableRef<T> could also differ in some way. A lack of API symmetry would likely be harmful to ergonomics and the ability to work with nullable ref structs, however.

Another notable alternative - though one that's likely far more work! - is adjusting the allows ref struct bound such that the ref struct rules are only imposed on generic allows ref struct types when specifically instantiated with a ref struct, or another generic parameter that is allows ref struct. As my knowledge of the current behavior is unclear, it's possible that this is already the case - meaning that option 1 would suffice, and was just something that was missed in the initial round of allows ref struct additions.

Another alternative, though it goes without saying that it would be the least preferable to me, is to simply not support nullable ref structs in the first place.

Risks

The risks of adjusting System.Nullable<T> were mentioned earlier, but there are some other risks that could arise in downstream code.

What immediately comes to mind is the following:

Developers could be confused by the addition of a System.NullableRef<T>, and might not know when to use it over System.Nullable<T>.
- This could be addressed with sufficient documentation of the new type, as well as implementation of compiler/tool warnings when explicitly naming System.NullableRef<T> or using System.NullableRef<T> with a non-ref struct type. These warnings could suggest using T? instead of naming the type directly.
New and existing APIs may still use sentinel "empty" values instead of nullable ref structs, creating friction.
- This is a more difficult issue to deal with, but it is possible that overloads/new method implementations could be provided to address this where applicable.
The implementation burden could be too high, or too far-reaching.
- I don't believe this will be an issue, but things may be more complex than I'm realizing. So it's worth mentioning!

This is not an exhaustive list, and just what comes to mind presently. It's possible that a feature like this would have other risks I'm not aware of.

dotnet-policy-service[bot] commented 3 months ago

Tagging subscribers to this area: @dotnet/area-meta See info in area-owners.md if you want to be subscribed.

huoyaoyuan commented 3 months ago

Related to https://github.com/dotnet/csharplang/discussions/5337 . ref field isn't that special comparing to fields of reference types.

rjgotten commented 2 months ago

This has a particular impact on the newer Span and ReadOnlySpan APIs, especially in the context of writing parsing code, where representing the lack of a value is important, and negatively harms the ergonomics and expressiveness of these APIs, making them more difficult to adopt when writing high-performance code.

Really not seeing that, considering you can use the Try pattern with out parameter to surface both a ReadOnlySpan<char> and a bool indicating whether a token was succesfully extracted during a parse operation. In cases where it was not, you just return default - aka ReadOnlySpan<char>.Empty for the out parameter.

This scenario is only relevant in those situations where you're predisposed to overloading the meaning of null to be false, 'error', etc. and the solution in that case is simply to stop doing that.

dotnet / runtime