dotnet / csharplang

The official repo for the design of the C# programming language
11.02k stars 1.01k forks source link

[Proposal]: First-Class Span Types #7905

Open 333fred opened 3 months ago

333fred commented 3 months ago

First-Class Span Types

Summary

We introduce first-class support for Span<T> and ReadOnlySpan<T> in the language, including new implicit conversion types and consider them in more places, allowing more natural programming with these integral types.

Motivation

Since their introduction in C# 7.2, Span<T> and ReadOnlySpan<T> have worked their way into the language and base class library (BCL) in many key ways. This is great for developers, as their introduction improves performance without costing developer safety. However, the language has held these types at arm's length in a few key ways, which makes it hard to express the intent of APIs and leads to a significant amount of surface area duplication for new APIs. For example, the BCL has added a number of new tensor primitive APIs in .NET 9, but these APIs are all offered on ReadOnlySpan<T>. Because C# doesn't recognize the relationship between ReadOnlySpan<T>, Span<T>, and T[], it means that any developers looking to use those APIs with anything other than a ReadOnlySpan<T> have to explicitly convert to a ReadOnlySpan<T>. Further, it also means that they don't have IDE tooling guiding them to use these APIs, since nothing will indicate to the IDE that it is valid to pass them after conversion. There are also issues with generic inference in these scenarios. In order to provide maximum usability for this style of API, the BCL will have to define an entire set of Span<T> and T[] overloads, which is a lot of duplicate surface area to maintain for no real gain. This proposal seeks to address the problem by having the language more directly recognize these types and conversions.

Detailed Design

https://github.com/dotnet/csharplang/blob/main/proposals/first-class-span-types.md

Design meetings

TahirAhmadov commented 3 months ago

I'm probably missing something obvious, but why doesn't just adding implicit operators suffice?

333fred commented 3 months ago

Implicit operators already exist. They do not suffice because they do not participate in the pieces that I outlined in the proposal.

TahirAhmadov commented 3 months ago

Oh I see what you mean, it's because of generics. Isn't there an opportunity to do something more global here? Say, look at the target type; it's generic; look at implicit operators and try to solve "the generic equation" - is it possible that the expression type satisfies the implicit operator's parameter type with any type argument; if solved, determine the generic argument; use the implicit operator.

333fred commented 3 months ago

These changes require a deep understanding of the relationship between the array type and a type parameter; I don't really see a generalizable mechanism that wouldn't be massively overcomplicated for the class of problem that we're trying to fix here.

KalleOlaviNiemitalo commented 2 months ago

Would you allow an extension method invocation to combine an implicit span conversion with an identity conversion?

using System;

class C {
    void M() {
        object[] objs = {};
        S.X(objs); // OK
        objs.X(); // ?

        (int a, int b)[] tuples = {};
        S.Y(tuples); // OK
        tuples.Y(); // ?
    }
}

static class S {
    internal static void X(this Span<dynamic> span) {}
    internal static void Y(this Span<(int c, int d)> span) {}
}

Or alternatively, allow this identity conversion as part of the implicit span conversion:

KalleOlaviNiemitalo commented 2 months ago

The conversion to Span\<T> throws in the covariant case

using System;

class C {
    static void Main() {
        object[] objs = new string[1];
        S.X(objs); // implicit conversion to Span<object> throws ArrayTypeMismatchException
        objs.X();  // implicit span conversion presumably likewise
    }
}

static class S {
    internal static void X(this Span<object> span) {}
}

The proposal defines the implicit span conversion to be a standard implicit conversion. It's then also a pre-defined conversion according to §10.4.1:

The standard conversions are those pre-defined conversions that can occur as part of a user-defined conversion.

This paragraph in §10.2.1 then has to be changed:

The pre-defined implicit conversions always succeed and never cause exceptions to be thrown.

333fred commented 2 months ago

Would you allow an extension method invocation to combine an implicit span conversion with an identity conversion?

No, that is not intended by this proposal.

KalleOlaviNiemitalo commented 2 months ago

Does that mean:

using System;

class C {
    void M() {
        (int a, int b)[]? array1 = null;
        (int c, int d)[]? array2 = null;
        array1.N(); // calls X.N(array1)
        array2.N(); // calls Y.N(array2)
        array1.AsSpan().N(); // error CS0121, ambiguous
    }
}

static class X {
    public static void N(this Span<(int a, int b)> span) {}
}

static class Y {
    public static void N(this Span<(int c, int d)> span) {}
}

I think this would be the first time that tuple element names affect which method is called.

Neme12 commented 1 month ago

I'm firmly against the idea that the language would pick a span overload over an array or string overload and I think it would be really unfortunate. There are issues with that, as I mentioned here:

I think that would be unfortunate. There are many reasons where a method takes a span as well as an array/string, and the span overload potentially has to allocate, whether for legacy reasons (e.g. Stream, where the default implementation has to potentially allocate and make a copy and forward to the original array-taking one - and I would guess the majority of real-world stream do not override the span one from what I've seen in other people's code), and I've seen it in cases other than legacy reasons as well where a new API was added with overloads for both either string/array and a span and the span-taking one had to allocate. I think it was for String implementing ISpanParsable - the regular parsable doesn't allocate, the span-taking one does. But it was still added for it to be usable in generic scenarios where you only have a span and not a string. But if you do have a string, you should still prefer the string-taking one. After all, string/arrays are more derived and more usable, and can be reused. Spans cannot be reused, and so sometimes you have to allocate an array/string from them.

The only case where preferring span overloads adds value is for params, where the array version allocates invisibly by default (but I might be missing other cases as well in the language where there are implicit allocations similar to this that would be mitigated by span). Of course, If C# didn't have invisible allocations from the start, even this wouldn't be a problem.

An array is a strictly derived type from a span. It has more featueres, it is more usable, and it can be put on the heap. If all you get is a span, you cannot put it on the heap or reuse it as a field, and you might have to allocate and copy the data to get something more usable. String, on the other hand, can be reused. If what you already have is an array or a string, you should call those corresponding overloads. Preferring span is only useful in cases like params span when you don't already have an array you could pass in, and it will implicitly allocate one. But that's an issue with the language implicitly allocating.

I'd be fine with the language understanding span types, but only if it still considered them base types of arrays or strings - not the other way around. It would be nice to avoid the duplication of overloads on MemoryExtensions for both Span<T> and ReadOnlySpan<T> that's there for basically every method.

CyrusNajmabadi commented 1 month ago

I'm firmly against the idea that the language would pick a span overload over an array or string overload

We've already decided, based on empirical need, that we will pick the span overload.

There are issues with that, as I mentioned

These issues do not apply in aggregate. And it would be bad for the ecosystem as a whole if we punished the majority of cases for a tiny subset of cases where there might be a problem. Those tiny subset should be designed to not have an issue here, versus requiring the rest of the ecosystem to try to get the perf that is sensible by default.

CyrusNajmabadi commented 1 month ago

An array is a strictly derived type from a span. It has more featueres, it is more usable, and it can be put on the heap

That's a negative. it cannot be put on the stack. The span approach means you can operate on both uniformly. And, in the normal common case, you get the stack benefits across the ecosystem. It is strictly the type that should be preferred. And that is borne out by the teams looking at this and specifically wnating both the most flexibility and the best perf.

Neme12 commented 1 month ago

But why would picking Span<T> when a method has an array overload as well be better in any case at all?

TahirAhmadov commented 1 month ago

@Neme12 the method knows how it's going to use the parameter, so it (in the overwhelming majority of cases) knows if it needs to allocate from a span (thus negating the performance gains of spans in the first place or even making it worse in some cases). Why don't we use the overload resolution attribute to solve this? Basically, if the span overload knows it's better, it has an attribute with a higher value, and if not, the array/string one will have a higher value.

CyrusNajmabadi commented 1 month ago

But why would picking Span when a method has an array overload as well be better in any case at all?

So that the caller doesn't need to heap allocate. Which will often be the case the majority of the time.

TahirAhmadov commented 1 month ago

So that the caller doesn't need to heap allocate. Which will often be the case the majority of the time.

To clarify, I think she is asking why we would call the span overload if you already have an array.

Neme12 commented 1 month ago

So that the caller doesn't need to heap allocate. Which will often be the case the majority of the time.

But as I said, it only allocates implicitly for params. Otherwise, when you do already have an array and a method has overloads for both array and span, it's completely fine to call the array one.

CyrusNajmabadi commented 1 month ago

you cannot put it on the heap

The vast vast vast majority of cases do not do this. And they would not do this either if they got an array (as the data could change out from underneath them). So a copy needs to happen in the array case as well here. So spans are no worse.

CyrusNajmabadi commented 1 month ago

Otherwise, when you do already have an array and a method has overloads for both array and span, it's completely fine to call the array one.

Which is exactly what the lang/compiler would do as that would be an identity conversion.

TahirAhmadov commented 1 month ago

Basically, the span overload would only be preferred if it's a collection expression and we target type it to span?

Neme12 commented 1 month ago

So why would picking a Span<T> overload be better than an existing array overload in any case other than params where the language otherwise implicitly allocates an array?

CyrusNajmabadi commented 1 month ago

Basically, the span overload would only be preferred if it's a collection expression

No. It would also be preferred for things like params.

TahirAhmadov commented 1 month ago

No. It would also be preferred for things like params.

Right, I didn't mention it because @Neme12 already said it.

CyrusNajmabadi commented 1 month ago

Ok. So, as i've been saying. We prefer span because it is better when a non-identity conversion is involved. If you have an array, and it's an identity conversion, then identity conversion rules continue to apply. :)

Neme12 commented 1 month ago

Oh sorry, I misunderstood the proposal then. I should have read all of the detailed design before commenting.

CyrusNajmabadi commented 1 month ago

i thnk you guys may be confusing what an identity conversion is versus the 'implicit span conversion' that fred is talking about adding. That's a implicit widening conversion. not an identity conversion. An identity conversion si still first priority in the conversion list.

CyrusNajmabadi commented 1 month ago

no worries :)

Neme12 commented 1 month ago

I agree it definitely makes sense to prefer Span in cases like collection expressions, where if it called the array one instead, it would have to allocate an array/list/whatever the natural type ends up being.

Neme12 commented 1 month ago

However, the language has held these types at arm's length in a few key ways, which makes it hard to express the intent of APIs and leads to a significant amount of surface area duplication for new APIs. For example, the BCL has added a number of new tensor primitive APIs in .NET 9, but these APIs are all offered on ReadOnlySpan<T>. Because C# doesn't recognize the relationship between ReadOnlySpan<T>, Span<T>, and T[], it means that any developers looking to use those APIs with anything other than a ReadOnlySpan<T> have to explicitly convert to a ReadOnlySpan<T>.

I'm still not understanding this part of the motivation though. When I look at the linked API proposal, I don't see any duplication. And why would a developer have to convert to ReadOnlySpan<T> explicitly when they have something like an array, and an implicit conversion exists? Or were these supposed to be extension methods originally?

333fred commented 1 month ago

When I look at the linked API proposal, I don't see any duplication.

Because they're counting on this proposal to come in and not force them to duplicate :slightly_smiling_face:.

And why would a developer have to convert to ReadOnlySpan explicitly when they have something like an array, and an implicit conversion exists?

Because the implicit conversion is a user-defined conversion, which doesn't carry through in a number of places, including type inference for these highly-generic cases.

Neme12 commented 1 month ago

Ohh, I see what the issue is now (the T cannot be inferred), thanks.

My concerns were about making spans to be considered more specific than e.g. arrays with respect to conversions, as opposed to the opposite, like today. Although now than I'm thinking about it, if a type has implicit conversions to both arrays and spans, the span ones could indeed be better as they should be non-allocating (at least that's the assumption for implicit conversions to spans), and could return a span directly over a struct field.

But I'd still definitely want to avoid a scenario where given an existing array and method overloads for both arrays and spans, the span one is picked anyway even though I have an exact type.