dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.67k forks source link

Reevaluate `params ReadOnlySpan<string>` overloads from #77873 #101261

Closed jozkee closed 4 months ago

jozkee commented 5 months ago

Existing compiler rules make that types that define implicit operators for string and string[] incur in a source breaking change. Example:

class Program
{
    static void Main()
    {
        var c = new C();
        Join(",", c); // error CS0121: The call is ambiguous between the following methods or properties: 'Program.Join(string, params string[])' and 'Program.Join(string, params ReadOnlySpan<string>)'
    }

    public static string Join(string separator, params string[] value) => "array";

    public static string Join(string separator, params ReadOnlySpan<string> value) => "span";
}

class C
{
    public static implicit operator string(C c) => "c";

    public static implicit operator string[](C c) => ["c"];
}

One real-world example is Microsoft.Extensions.Primitives.StringValues which is passed into string.Join(string, string[]) on aspnetcore, and that it failed to compile when the changes in https://github.com/dotnet/runtime/pull/100898 flowed into their repo.

API Proposal

Not quite a new API proposal, but a proposal for keeping as is the APIs on https://github.com/dotnet/runtime/issues/77873. These APIs would make existing callsites of their analogous params string[] overloads to fail if a StringValues is being passed as argument.

System.String.Concat(params ReadOnlySpan<string?> values)
System.String.Join(char separator, params ReadOnlySpan<string?> value)
System.String.Join(string? separator, params ReadOnlySpan<string?> value)
System.IO.Path.Combine(params ReadOnlySpan<string> paths)
System.IO.Path.Join(params ReadOnlySpan<string?> paths)
System.Text.StringBuilder.AppendJoin(string? separator, params ReadOnlySpan<string?> values)
System.Text.StringBuilder.AppendJoin(char separator, params ReadOnlySpan<string?> values)

Considering that having the ambiguity of string and string[] implicit operators feels like is against FDGs 5.7.2 Conversion Operators

DO NOT provide a conversion operator if such a conversion is not clearly expected by the end users.

I think the best path would be to accept the source breaking change and expect users to update their codebase, preferably picking the new ReadOnlySpan<string> overloads.

Alternatively

@dotnet/roslyn could the compiler rules be adjusted to accommodate StringValues? Why adding a public static implicit operator ReadonlySpan<string> to StringValues doesn't make the compiler select string.Join(string, ReadonlySpan<string>)? It does make the error go away, though, by selecting the string[] overload.

dotnet-policy-service[bot] commented 5 months ago

Tagging subscribers to this area: @dotnet/area-system-memory See info in area-owners.md if you want to be subscribed.

RikkiGibson commented 5 months ago

could the compiler rules be adjusted to accommodate StringValues? Why adding a public static implicit operator ReadonlySpan<string> to StringValues doesn't make the compiler select string.Join(string, ReadOnlySpan<string>)? It does make the error go away, though, by selecting the string[] overload.

string[] is considered better because implicit conversion from string[] to ReadOnlySpan<string> exists, and not the other way around. Thus the compiler thinks of string[] as "more derived/specific" than ReadOnlySpan<string>.

It seems like when overload resolution is comparing a conversion from StringValues->string to invoke Join(ReadOnlySpan) in expanded form, with a conversion from StringValues->string[] to invoke Join(string[]) in normal form, there's no rule which indicates the conversion to string is better than the conversion to string[].

However once an operator StringValues->ReadOnlySpan<string> is added, now the overloads can be compared as: oh, I can either invoke in Join(ReadOnlySpan) in normal form, or Join(string[]) in normal form, and the conversion from string[] to ReadOnlySpan indicates that string[] is better.

I'm not sure what change is most appropriate to fix the case in this issue, though. I will say that choosing ReadOnlySpan<string> in expanded form feels undesirable for this case as it converts the StringValues to string and implicitly joins the inner strings as part of that.

jozkee commented 5 months ago

However once an operator StringValues->ReadOnlySpan is added, now the overloads can be compared as: oh, I can either invoke in Join(ReadOnlySpan) in normal form, or Join(string[]) in normal form, and the conversion from string[] to ReadOnlySpan indicates that string[] is better.

I would expect conversion of StringValues->ReadOnlySpan<string> to be better than StringValues->string[] and avoid weight-in the array_type->span_type conversion in such indirect cases. If that would be possible, I think we could add the implicit ReadOnlySpan<string> operator to fix the issue on our end.

jozkee commented 5 months ago

cc @dotnet/fxdc

RikkiGibson commented 5 months ago

I would expect conversion of StringValues->ReadOnlySpan<string> to be better than StringValues->string[]

It looks like "overload resolution priority" was designed to address this specific kind of problem. https://github.com/dotnet/csharplang/pull/7906

CyrusNajmabadi commented 5 months ago

could the compiler rules be adjusted to accommodate StringValues? Why adding a public static implicit operator ReadonlySpan<string> to StringValues doesn't make the compiler select string.Join(string, ReadOnlySpan<string>)? It does make the error go away, though, by selecting the string[] overload.

string[] is considered better because implicit conversion from string[] to ReadOnlySpan<string> exists, and not the other way around. Thus the compiler thinks of string[] as "more derived/specific" than ReadOnlySpan<string>.

It seems like when overload resolution is comparing a conversion from StringValues->string to invoke Join(ReadOnlySpan) in expanded form, with a conversion from StringValues->string[] to invoke Join(string[]) in normal form, there's no rule which indicates the conversion to string is better than the conversion to string[].

However once an operator StringValues->ReadOnlySpan<string> is added, now the overloads can be compared as: oh, I can either invoke in Join(ReadOnlySpan) in normal form, or Join(string[]) in normal form, and the conversion from string[] to ReadOnlySpan indicates that string[] is better.

I'm not sure what change is most appropriate to fix the case in this issue, though. I will say that choosing ReadOnlySpan<string> in expanded form feels undesirable for this case as it converts the StringValues to string and implicitly joins the inner strings as part of that.

Isn't this was @333fred had a proposal for?

AlekseyTs commented 5 months ago

I would expect conversion of StringValues->ReadOnlySpan<string> to be better than StringValues->string[] and avoid weight-in the array_type->span_type conversion in such indirect cases.

What is the basis for these expectations? Usually, expectations should be based on something.

jozkee commented 5 months ago

What is the basis for these expectations?

  1. Given that we implicitly convert arrays into [ReadOnly]Spans in other contexts, feels counter-intuitive to prefer the opposite here.
  2. When Span & the array conversion were introduced, I doubt it was considered that it would resolve conversion fights in other types in favor of the array.
  3. The number of efforts put into spreading Span benefits, e.g. although I'm not sure if such benefits exist in conversions.
AlekseyTs commented 5 months ago

Given that we implicitly convert arrays into [ReadOnly]Spans in other contexts, feels counter-intuitive to prefer the opposite here.

Based on what? I would expect exactly the behavior that we have, and that expectation is based on language rules, not on feelings. From your perspective span types are unconditionally "better" than array types, but compiler cannot get into one's head and read one's mind/feelings. There is a rational behind the language rules. To simplify, language prefers a candidate that takes a "narrower" type for a given argument. For example, an int parameter is preferred over a long parameter. The process of determining what type is "narrower" is based on the set of available implicit conversions. If type A is implicitly convertible to type B and B is not implicitly convertible to A, type A is "narrower" than B. Thus, it is "better" than B. By this rule, int is better than long, and array is "better"/"narrower" than span.

I guess, the point that I am trying to make. There is nothing wrong to have a desire that compiler behaves a certain way. But the desire itself is not a good basis (I would even say it is not a basis at all) to have an expectation that compiler is supposed to behave this way. Basically, there is an analogy between a contract for an API and language rules. The contract defines the expectations, not the other way around. It is, obviously, fine to ask to adjust the language rules to align expected behavior of the compiler with the desired outcome. But, in my opinion, that is quite different from simply expecting things to work a certain way just because there is a desire for that at given moment in time.

CyrusNajmabadi commented 5 months ago

@AlekseyTs the ldm has expressed many times the desire to have the Span based forms "win". So far we've been accomplishing that by patching certain sections of the spec to make that happen. Fred also has a more generalized proposal to nip things in the bud and try to take care of it more uniformly.

It seems very easy and reasonable to me that users would take away from this an intuition on how things should behave. That's how we operate as well. I agree the spec defines what will actually happen. But it is our intuitions that drive is to get the spec to the place we want it to be.

bartonjs commented 5 months ago

Video

namespace Microsoft.Extensions.Primitives
{
    public readonly partial struct StringValues
    {
        public static implicit operator ReadOnlySpan<string>(in StringValues value);
    }
}
RikkiGibson commented 5 months ago

It looks like in StringValues value was probably meant to be scoped in StringValues value. Assuming the goal of in was to minimize value copies, and not to allow the reference to escape into the return value.

stephentoub commented 5 months ago

It looks like in StringValues value was probably meant to be scoped in StringValues value. Assuming the goal of in was to minimize value copies, and not to allow the reference to escape into the return value.

The reference needs to escape into the return value. The argument is in to allow for this to be allocation-free. StringValues stores either a string or a string[], and we want to be able to do something like this:

public static implicit operator ReadOnlySpan<string>(in StringValues value) =>
    value._values is string ? new ReadOnlySpan<string>(in Unsafe.As<object, string>(ref Unsafe.AsRef(in value._values))) :
    new ReadOnlySpan<string>((string[])value._values);

Does adding scoped add any benefits? We've generally only added it when it's necessary.

RikkiGibson commented 5 months ago

Oh interesting! So you want result to be able to hold a reference to the struct field in the single item case. I didn't understand this aspect of the internal layout of StringValues. Carry on :)

Does adding scoped add any benefits? We've generally only added it when it's necessary.

I would lean toward using it when it is meaningful to do so (when the method has references both as inputs and outputs.) Use it unless you either want to return a reference to an input, or want to reserve the ability to do so in the future.

stephentoub commented 5 months ago

Does adding scoped add any benefits? We've generally only added it when it's necessary.

I would lean toward using it when it is meaningful to do so (when the method has references both as inputs and outputs.) Use it unless you either want to return a reference to an input, or want to reserve the ability to do so in the future.

We've opted not to do that because we'd end up explicitly adding it to a boatload of APIs where it doesn't actually affect behavior. We only use it when it's not a nop.

stephentoub commented 5 months ago

@MihaZupan highlighted that we can't do public static implicit operator ReadOnlySpan<string>(in StringValues value) and still have it be safe: after calling this, the returned ReadOnlySpan<string> will be referencing the original location of the StringValues, which could be overwritten, resulting in type safety violations e.g. SharpLab

At that point, we're back to there being no way to make this API safe and keep it allocation-free, since the only way to make it allocation-free for a single string is to take a reference to the original location, which we can't safely do.

Neme12 commented 5 months ago

I would add an AsSpan() method to StringValues as well. Maybe noone explicitly requested it but it's general goodness and I have missed a similar method on many types in .NET and ASP.NET Core similar to this one, though not this one in particular. People just don't make requests for tiny things like this and move on with their lives, but it would be good for the ecosystem to add this wherever it can bring value - i.e. it can be non-allocating. I don't think it can actually be non-allocating on netstandard2.0 because there's no way to create a span from a ref/in (no span constructor and no MemoryMarshal.CreateSpan). That leaves us with two options: make it allocating for that case, or don't expose it at all for netstandard and only for netcore. Third option is not do it at but I would really prefer to avoid us limiting ourselves just because some package also targets netstandard. If it can be better on netcore (i.e. additional method), it should be.

stephentoub commented 5 months ago

It can't be non-allocating on .NET Core, either.

Neme12 commented 5 months ago

Sorry, I missed that. Yeah, that example worries me so much because I have done span-returning methods pointing to readonly structs. Somehow I assumed if it's a readonly struct it's ok. Damn, thanks for the example, but damn... this worries me because this is something you could do with completely safe code. Well, except for the cast. But you'd still be able to overwrite the value of the string by reassigning the struct.

EDIT: Although... I guess then that is kind of expected because readonly span only means you can't change the thing it's pointing to, not that it cannot be changed. Hmmm, we need an ImmutableSpan 😄

But I assumed the struct stored the fields separately for some reason (I guess because that's how I would have done it).

Neme12 commented 5 months ago

Use it unless you either want to return a reference to an input

But in this case, even if this was safe, it would need to return a reference to the input - in the returned span. It needs to be unscoped. Either as an unscoped in parameter in a regular static or extension method, or an instance method (which are scoped by default) marked as UnscopedRef.

But I disagree that everything should be unscoped by default. If the thing can be scoped, it should be scoped (just like readonly), because it allows it to be used in more scenarios - unscoped refs are limiting. (And IMHO scoped should have been the default, just like it is for struct instance methods).

EDIT: Sorry, I see that the fact that it needs to be unscoped was mentioned as well. I will refrain from commenting again without reading the thread first.

Neme12 commented 5 months ago

the ldm has expressed many times the desire to have the Span based forms "win"

I think that would be unfortunate. There are many reasons where a method takes a span as well as an array/string, and the span overload potentially has to allocate, whether for legacy reasons (e.g. Stream, where the default implementation has to potentially allocate and make a copy and forward to the original array-taking one - and I would guess the majority of real-world stream do not override the span one from what I've seen in other people's code), and I've seen it in cases other than legacy reasons as well where a new API was added with overloads for both either string/array and a span and the span-taking one had to allocate. I think it was for String implementing ISpanParsable - the regular parsable doesn't allocate, the span-taking one does. But it was still added for it to be usable in generic scenarios where you only have a span and not a string. But if you do have a string, you should still prefer the string-taking one. After all, string/arrays are more derived and more usable, and can be reused. Spans cannot be reused, and so sometimes you have to allocate an array/string from them.

The only case where preferring span overloads adds value is for params, where the array version allocates invisibly by default (but I might be missing other cases as well in the language where there are implicit allocations similar to this that would be mitigated by span). Of course, If C# didn't have invisible allocations from the start, even this wouldn't be a problem.