fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
340 stars 21 forks source link

Pattern matching spans of chars against constant strings #1351

Open brianrourkeboll opened 2 months ago

brianrourkeboll commented 2 months ago

Proposal

I propose that we allow pattern-matching values of type ReadOnlySpan<char> and Span<char> against constant strings.

match "abc123".AsSpan 1 with
| "bc123"  -> printfn "Matches."
| "abc123" -> printfn "Doesn't match."
| "a"      -> printfn "Doesn't match."
| _        -> printfn "Doesn't match."
let [<Literal>] SomeConstant = "abc123"

let f (span : ReadOnlySpan<char>) =
    match span with
    | SomeConstant -> printfn "Matches."
    | _            -> printfn "Doesn't match."
let span = "abc123xyz".AsSpan ()
let lastDigit = span.LastIndexOfAnyInRange ('0', '9')

match span.Slice lastDigit with
| ""     -> printfn "Ends with a digit."
| suffix -> printfn $"Ends with '{suffix.ToString ()}'."

Compare the equivalent C# proposal (implemented in C# 11).

We would do this by first automatically generating calls to System.MemoryExtensions.AsSpan on a constant string being matched against, and then by passing the resulting span and the input span to the appropriate System.MemoryExtensions.SequenceEqual overload.

The existing way of approaching this problem in F# is...

Notes

Pros and cons

The advantages of making this adjustment to F# are

The disadvantages of making this adjustment to F# are

Alternatives

Extra information

Estimated cost (XS, S, M, L, XL, XXL):

Related suggestions: (put links to related suggestions here)

Affidavit (please submit!)

Please tick these items by placing a cross in the box:

Please tick all that apply:

For readers

If you would like to see this issue implemented, please click the :+1: emoji on this issue. These counts are used to generally order the suggestions by engagement.

vzarytovskii commented 2 months ago

I am personally not a huge fan of introducing intrinsics to the compiler it should know of to be able to optimize it. Implicit conversion looks better tbh, but will involve additional hidden conversions/allocations. Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?

brianrourkeboll commented 2 months ago

will involve additional hidden conversions/allocations

Hmm, it would indeed involve hidden invocation of op_Implicit and creation of ReadOnlySpan<char>s, but it wouldn't involve any heap allocations.

Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?

While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.

Separately, while it's not exactly the same, there is already some precedent in the language for syntactic crossover between string-like things and array-like things:

// Looks like a string...
match "abc"B with
// ...But it's actually an array.
| [|0x61uy; 0x62uy; 0x63uy|] -> printfn "Matches."
| _ -> printfn "Doesn't match."
baronfel commented 2 months ago

From a design perspective, it's also nice as a user when the easy thing is also the fast/correct thing.

vzarytovskii commented 2 months ago

will involve additional hidden conversions/allocations

Hmm, it would indeed involve hidden invocation of op_Implicit and creation of ReadOnlySpan<char>s, but it wouldn't involve any heap allocations.

Yeah, I meant if we do type directed the other way (span -> string) for some cases, where we can't (?) do AsSpan.

Also, confusing from the usability/type system perspective - "Why do I compare Span to string"?

While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.

Yeah, it can just be somewhat confusing for some people, I suppose.

vzarytovskii commented 2 months ago

From a design perspective, it's also nice as a user when the easy thing is also the fast/correct thing.

I suppose it's a tradeoff - do exactly what user wrote (or close to it) versus try and rewrite/optimize things. We already do both, so no really huge directional change.

I suppose it's up to @dsyme now - whether we want to do more with spans (and byreflikes in general), since it's clearly where Roslyn is going (and clr in general).

charlesroddie commented 2 months ago

I don't understand this semantically. "bc123" appears to be a string, and so the match appears to be checking for equality, but a string is not a ReadOnlySpan so equality doesn't make sense.

brianrourkeboll commented 2 months ago

"bc123" appears to be a string, and so the match appears to be checking for equality, but a string is not a ReadOnlySpan so equality doesn't make sense.

Yeah, I get that it "is" not a string, and that it might not "feel" F#-y if you focus on the current type implications of string literals rather than viewing them as a shape or pattern of chars to match against — see my response above (https://github.com/fsharp/fslang-suggestions/issues/1351#issuecomment-1960064255):

While I can sympathize with this perspective somewhat, once you know what spans are and are for, then it's annoying not to be able to mix and match spans created from strings and strings themselves — C# enables this on purpose by using op_Implicit to autoconvert strings to spans in as many scenarios as possible. I of course wouldn't argue for more implicit conversions in general, but I think the benefit would outweigh the potential confusion in this specific scenario.

The whole point of spans is to serve as windows into contiguous stretches of memory — and a string literal is just special syntax for a stretch of contiguous chars.

Sure, when a string literal is used as an object, an object is created to point at that stretch of chars, but — as the treatment of strings and spans in C# string slicing and pattern-matching as well as the design of System.String itself in the BCL show — strings and spans are meant to be treated as more or less interchangeable, including in pattern-matching.

I do understand that the usage of op_Implicit, etc., would probably not be how it would have been done if the idea of spans had existed in .NET (and thus F#) from day one, and that it doesn't fit perfectly in the F# tradition, but it seems clear that testing equality between a span of chars and a string is meant to be a meaningful operation from the .NET perspective.

So in my mind the change here doesn't imply some new type equivalence between spans and strings, but rather specifically enables treating string literals as patterns representing spans of characters against which, well Spans or ReadOnlySpans of characters can be matched.