Proposal: list pattern - Githubissues

gafter commented 4 years ago

Allow is [ 1, 2, 3 ] (list pattern), ~~is [length] (length pattern)~~ and is [ 1, ..var x, 5 ] (slice pattern).

Speclet: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/list-patterns.md

See https://github.com/dotnet/csharplang/pull/3245

LDM Discussions:

HaloFour commented 4 years ago

Awesome. Is this going to build on the linked proposal (with your additional feedback) or is this to track the work of designing list/collection/dictionary/etc patterns in general?

gafter commented 4 years ago

@HaloFour This is to track answering questions like those.

alrz commented 4 years ago

list/collection/dictionary/etc patterns

I wonder how you'd imagine dictionary/indexer patterns work and what use cases you have in mind.

Syntax-wise it could be {[constant]: pattern} but that's useless because any common type that exposes an indexer, throws if it's out of range.

gafter commented 4 years ago

Maybe we could use a TryGet method.

alrz commented 4 years ago

To me, that's too specific to have a dedicated syntax for. I think we should enable patterns for all Try-like methods which is essentially what user-defined positional patterns are.

map is TryGetValue("constant key", pattern) // maybe with another name

This could made to work only on "some" methods as mentioned in https://github.com/dotnet/csharplang/issues/1047#issuecomment-440254430

Syntactical symmetry with indexer initializers makes it attractive though, however, the fact that it would not call the indexer might be confusing.

HurricanKai commented 4 years ago

Would this allow something like

Array: { Length: 1, [1] }

checking whether there is only one element and whether that element is 1? Also, looking at the PR it's unclear to me whether this would allow checking for just the 5th element for example? Looks like that's not possible?

alrz commented 4 years ago

@HurricanKai

The pattern [1] already includes the check for length=1 so you don't need to add anything else.

To match Nth element you could prepend N-1 discards - it tends to get long but it is definitely possible.

The team is interested in supporting indexers, so using range indexer pattern, we could support { [5]: var elem } and generate e.Length >= 5 && e[5] is var elem (we'd only accept constant indexer args so we can calculate the minimum size).

I'm not sure if that's common enough for a dedicated support, plus it won't work with generic dictionaries which is probably a deal breaker.

HurricanKai commented 4 years ago

I'd generally much prefer [idx]: { ... } because it looks much more natural inside of more pattern matching, in fact I tried typing it out like that today cause I thought it would work 😄 ie [2]: { [5]: { Length: 5 } } on a string[][] would be quite amazing if possible.

Unknown6656 commented 3 years ago

Does this proposal also include list decomposition? e.g. (syntax debatable):

List<int> my_list = .....;

{ int first, int second, .., int last } = my_list;

333fred commented 3 years ago

No. That's https://github.com/dotnet/csharplang/issues/4082.

CyrusNajmabadi commented 3 years ago

@jcouv We've been hearing quite a bit about this from several community channels (twitter, discord, etc.) Can we bring this to an upcoming meeting to discuss and make some decisions on? Thanks!

jcouv commented 3 years ago

@CyrusNajmabadi Will do.

mungojam commented 3 years ago

reading the design meeting notes, if the eventual outcome is that we can also replace:

var things = new [] { "Car", "Motorbike", "Cab" };

with

var things = ["Car", "Motorbike", "Cab"];

I'll be very pleased. It's one of the things that jars when I work with 3 different languages in my day job (C#, Typescript/json and Python)

mpawelski commented 3 years ago

I really like the direction the team is going for with list patterns. Especially this part (emphasis mine)

Return to the original proposal syntax, using square brackets ([]) to denote a list pattern. This breaks with the correspondence principle, but it does have stronger parallel with other languages, has a natural base case, and we could potentially add a new creation form that achieves correspondence (and take the time to address things like ImmutableArray, which cannot be initialized by collection initializers today). ... Today, square brackets are used for indexing operations and for specifying the length of an array. Nothing in C# uses them to denote a group of things that is a collection. There are proposals to use these brackets for an improved version of collection initializers though, giving us an opportunity for future fulfillment of the correspondence principle, even if it won't be fulfilled on initial release.

from LDM June 14th.

Hawing possibility to use this "simplified" bracket syntax when the type of collection targer is know would be great. If I understand correctly we could do something like this, right?:

string[] ar1 = ["foo", "bar"];
List<number> = [1,2,3];

public record Person(string FirstName, string LastName);

ImmutableArray<Person> = [ new("James","Bond"), new("John", "Smith") ]

But if this new syntax would be used without target type (like in var things = ["Car", "Motorbike", "Cab"]; example by @mungojam ) it would be also be nice to have array as "natural type" (just like planned improvements for lambda in C# 10)

Then we just need something similar for dictionaries (#414) and working with basic collections (lists and dictionaries) would be much more pleasant in C# 😁

orthoxerox commented 3 years ago

I think the natural type of the new syntax should be List<T>, not T[]. Arrays already have their own creation syntax and lists are used much more often than bare arrays.

mpawelski commented 3 years ago

I was thinking about having T[] as natural type because this is the "lightest" collection, and succinct syntax will probably lead developers to use it more than the old way of creating arrays. If natural type would be List<T> then people might use Lists even for cases when array is enough.

But after second thought maybe it doesn't matter?

If we'll ever get nice dictionary literal (like described in #414) for creating and pattern matching dictionaries and also want to have "natural type" to be able to use it with var then we would probably choose <Dictionary<TKey,TValue> anyway. Then choosing List<T> for new "list syntax" would make sense (since it's also mutable collection that can grow, not like array)

theunrepentantgeek commented 3 years ago

In my own code, I almost never want Array - it's literally the 1% case. Maybe 60% of the time I want List<T>, the rest of the time it'll be something immutable, or maybe a Set<T>

There is no natural type for list construction in .Net, because we have a rich set of well supported collection types.

This is unlike languages like Go which have a privileged collection type (eg slice) that stands alone because the language provides exclusive support that can't be extended to custom types.

jnm2 commented 3 years ago

There is no natural type for list construction in .Net, because we have a rich set of well supported collection types.

Not yet, but I think there's a good reason to boost List<T> to that level as the single most-used collection type (and for no other reason). It would feel as legitimate to me as it did to boost the single most-used delegate type for each signature (Func and Action) as natural types for delegates.

alrz commented 3 years ago

Not yet, but I think there's a good reason to boost List

Why not ImmutableArray<T> then? roslyn is a good example where that's the dominated collection type.

333fred commented 3 years ago

Folks, remember that this issue is for list patterns. If we want to start debating what a new collection syntax means in initialization, best to either find an existing issue on the topic or open a new discussion.

sajjadarashhh commented 2 years ago

there is in chance to have any signature like this in future: int[] firstArray=[1,2,3]; int[] secondArray = [...firstArray,5,6,7,8,9];

333fred commented 2 years ago

there is in chance to have any signature like this in future: int[] firstArray=[1,2,3]; int[] secondArray = [...firstArray,5,6,7,8,9];

@sajjadarashhh https://github.com/dotnet/csharplang/issues/5354

CyrusNajmabadi commented 2 years ago

In the collection-literal discussion (https://github.com/dotnet/csharplang/issues/5354) a point was raised that using .. for list-patterns was problematic if we wanted the corresponding expression syntax for splatting. Specifically, the collection literal proposal suggests the parallel [e1, ..e2] syntax to go along with the [p1, ..p2] pattern syntax.

This creates a syntactic ambiguity for collection expressions. Specifically, say someone wants a collection of ranges. they cannot write: [..i1, ..i2] as that will be interpreted as splatting, not ranges. There is a workaround of [(..i1), (..i2)] but that's somewhat unfortunate. The feedback considers switching our syntax here (for both) away from .. to .... So a list pattern would be written: [p1, ...p2] and a collection literal would be: [e1, ...e2]. A collection of ranges would need no special syntax at that point.

This might also benefit us in the future if we ever wanted to pattern match ranges themselves. e.g. range switch { >1..<10.

In a real sense, by squatting on .. for list-patterns we have a correspondence mismatch between .. in a pattern and .. in an expression, which isn't great and may be limiting in the future in other ways.

333fred commented 2 years ago

Well, the main issue is that .. is a slice pattern, which is exactly what .. is used for today. var slice = list[..^1]; is the corresponding slice expression to the pattern [.. var slice, _].

CyrusNajmabadi commented 2 years ago

Well, the main issue is that .. is a slice pattern

Right. But that hasn't shipped yet right? So i think the issue to raise with LDM is: should we have a different syntax (like ...) for the slice pattern?

333fred commented 2 years ago

Right. But that hasn't shipped yet right? So i think the issue to raise with LDM is: should we have a different syntax (like ...) for the slice pattern?

I think you missed my point: slicing is already in the language, as the range operator. We have symmetry here between the variable initializer form and the list slicing form. It's worth bringing up, but I think that, no matter what we do, we're going to have an inconsistency somewhere.

mpawelski commented 2 years ago

Ok, I think I'm starting to get the rationale for the .. syntax in list patterns and why it's called "slice pattern".

I'm a great proponent of having symmetry between construction expressions and deconstruction patterns in Pattern Matching™

But for me this "slice pattern" doesn't feel that symmetric to "slice expression" (already existing in C# since Ranges were introduced). IMO it feels much more natural to think about it as "spread pattern" that is symmetric to spread_element in new "collection literals" proposal.

And because we don't have list pattern yet I suggest to look at current "slice pattern" as "spread pattern" (and possibly rename it), and change the syntax from .. to .... And for spread_element in list literal proposal I suggest to change it from .. s1 to ... s1.

And why I didn't look at current slice pattern as symmetric to current "slice expression"? Lets look at couple of examples:

construction	deconstruction
`var slice = arr[..^1]`	`[.. var slice, _] => slice`
`var slice = arr[1..]`	`[_, .. var slice] => slice`
`var slice = arr[3..]`	`[_,_,_, .. var slice] => slice`
`var slice = arr[4..^2]`	`[_,_,_,_, .. var slice, _, _] => slice`
`var slice = arr[4..6]`	not possible?, or maybe something like this: `[_,_,_,_, var a, var b, ..] => {var slice = new[]{a, b};} }`

If we want slice pattern to be "symmetric" then this syntax makes more sense:

construction	deconstruction
`var slice = arr[..^1]`	`[.. var slice ^1] => slice`
`var slice = arr[1..]`	`[1 .. var slice] => slice`
`var slice = arr[3..]`	`[3 .. var slice] => slice`
`var slice = arr[4..^2]`	`[4 .. var slice ^2] => slice`
`var slice = arr[4..6]`	`[4 .. var slice 6] => => slice`

Which is much more symmetric. But it looks very strange and unfamiliar to me. Probably because I haven't seen such pattern in any other language. Also I don't think this syntax gives us much, you probably very rarely want to pattern match to small slice in the middle of very long collection where skipping all those _,_,_,_,_ noise would be beneficial.

But if you look at ... in list pattern as a "spread pattern" symmetric to "spread expression" (spread_element in list literal proposal). Then it looks much more "symmetric"

construction	deconstruction
`var collection = [a, b, ...otherCollection]`	`[var a, var b, ... var otherCollection]`
`var collection = [a, ...otherCollection, b, c]`	`[var a, ... var otherCollection, var b, var c]`
`var collection = [...otherCollection, a, b]`	`[... var otherCollection, var a, var b]`

And I think this "symmetry" between "spread expression" and "spread pattern" is much clearer to see by an average developer than this "symmetry" between "slice expression" and "slice pattern".

alrz commented 2 years ago

[(..i1), (..i2)]

Assignments aren't allowed in collection initializers and in practice it hasn't been a problem. I think this has the same degree of likelihood to be actually useful.. Is it worthwhile to break the symmetry across all three (ranges, patterns, collection literals) just to support one scenario out-of-the-box? (and the second you want to use ... for anything else, you're back to square one)

CyrusNajmabadi commented 2 years ago

I'm also concerned that by using ... we may limit future applicability of that term (like using it for splatting tuples). But that may also be an argument in support of it. A 'spread' operator that works on sensible scenarios.

CyrusNajmabadi commented 2 years ago

Note, I'm on the fence here. I like that .. works reasonably well. But there are def cracks and inconsistencies with it I think is worth discussing. My gut feels like we won't change anything here. But it's like to bring it up to still discuss

TonyValenti commented 2 years ago

@CyrusNajmabadi - Regarding the "AddRange" operator (.. vs ...), I think that a new operator might be a better choice and I propose #:

var v1 = new[]{ 5, 7, 11};
var v2 = new[] {17, 19, 23};
var values = [1, 2, 3, #v1, 13, 17, #v2[1..], 27];

Also, I would expect slicing to be used inside of the literal and I think that syntax like:

var values = [ ...v2[1..] ];

doesn't particularly look nice with all the periods in it.

CyrusNajmabadi commented 2 years ago

@v1 is already legal code and has meaning today in c# :)

TonyValenti commented 2 years ago

@CyrusNajmabadi Really?! What's it mean? I can't find the @ operator at this page: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/operators/

bernd5 commented 2 years ago

see: https://docs.microsoft.com/en-us/dotnet/csharp/fundamentals/coding-style/identifier-names

CyrusNajmabadi commented 2 years ago

It's not even an operator. It's just a legal syntax for all identifiers.

TonyValenti commented 2 years ago

@CyrusNajmabadi @bernd5 Thanks for the clarification. I never knew about that! In that case, perhaps # should be the operator. Thinking about it, I like the fact that # can be read as "Number, Count, or Enumerate". ie:

var values = [1, 2, 3, #v1, 13, 17, #v2[1..], 27];

ahmad2smile commented 2 years ago

A new operator with 2 dots .. to take rest is definatly ok. But I think using/saving 3 dots ... for some other operation would definatly make ppl confused (at the very least ppl coming from JS world, which is a very big world).

My suggestion would be 3 dots ... as spread and rest operator and keep 2 dots .. for range, or at the very least use same operator for spread and rest.

ratcodes commented 2 years ago

Hi there! Love the proposal; super excited about where it goes! There are a few ideas that came to mind related to list patterns—if this is incorrect posting format, please let me know and I can promptly fix it.

Handling pattern matching with IEnumerable<T> where patterns rely on the Length or Count being defined, or for patterns that might want to return a sub-collection of an IEnumerable<T>, could the yield keyword possibly be used here instead of throwing an exception?

Example:

IEnumerable<int> collection = GetSomeCollection();
var item = collection yield switch 
{
    [int first, _] => $"first item is: {first}",
    _ => "default"
};

// or

var items = collection yield switch 
{
    // maybe illegal syntax
    [int a, int b, _] => [a, b],
    _ => [-1]
};

This would make it explicit that the collection might require deferred iteration on the switch expression in order to validate the pattern if there is no suitable Length or Count property (after verifying whether or not those properties can be retrieved via casting). Iteration could terminate early once a suitable match is found. I know yield is used to define iterators which could make this confusing, but its usage is limited enough that it could be OK. Another keyword in its place could make this more acceptable, though.

Shorthand for Length or Count could use the # symbol, inspired a bit from the Lua length operator # (section 2.5.5).

Example:

List<int> list = GetSomeList();
var message = list switch
{
    # is 4 => "Length is 4",
    # is > 0 and < 4 => "Length is positive, but less than 4",
    _ => "Nope"
};

var motd = #list is 5 and [2] switch 
{
    "dog" => "Bark",
    "cat" => "Meow",
    "frog" => "Ribbit",
    _ => "The farm is quiet today."
};

Would work well with the above example with not needing to know the underlying type other than that it's some collection with some way to tell us the number of items contained, but this could limit options for an explicit "at-index" operator like suggested in posts above.

This is a bit out there, but with regards to the spread/slice ambiguity, could we use already-existing query syntax to fill in the cracks?

from could be used as a keyword that would expect to be followed by a range, and implicitly convert single digits into index selectors, instead of the ... spread operator.

Example:

List<int> list = GetSomeList();

var message = list switch
{
    // could be confusing at first glance, but not too bad
    [2..5] is [3, 4, 5] => "",

    // with spread operator
    ...[2..5] is [3, 4, 5] => "",

    // with "from" keyword as prefix
    from [2..5] is [3, 4, 5] => "",

    _ => "default"
};

The strength of its clarity comes in the nesting:

var moreInvolvedPatterns = list switch
{
    // explicit element with discard
    [int first, _, [2..5]] is [1, 3, 4, 5] => "",

    // with spread operator and no discard or sub-bracket
    [0, ...2..5] is [1, 3, 4, 5] => "",

    // spread operator with sub-bracket
    [0, ...[2..5]] is [1, 3, 4, 5] => "",

    // with "from" keyword as internal selector for sub-collections, using comma to include
    [0, from 2..5] is [1, 3, 4, 5] => "",

    // with "not" keyword to exclude element(s), no "from"
    [0..5 not 1] is [1, 3, 4, 5] => "",
    [0..10 not 1 and not 5..10] is [1, 3, 4, 5] => "",
    [0..10 not (1 or 5..10)] => "",

    // with "and" to include sub-collections
    [0 and 2..5] is [1, 3, 4, 5] => "",

    // with "and", "not", and "from" keywords together
    [0 and from 2..5] is [1, 3, 4, 5] => "",
    [0..10 not 1 and not from 5..10] is [1, 3, 4, 5] => "",
    [0..10 not (1 or from 5..10)] => "",

    _ => "default"
};

Example outside of a switch with from, and, and not:

// a bit weird!
var stuff = list[0..10 not 1 and not from 5..10];

As an alternative to from, or an additional keyword to aid in pattern matching to sub-collections, we could implement a take keyword:

var item = list switch
{
    take [2..5] is [3, 4, 5] => "",
    [0 take 2..5 and 9] is [1, 3, 4, 5, 10] => "",
    _ => "default" 
};

// or

var items = list switch
{
    [0..10 not (1 or from 5..10)] is [1, 3, 4, 5] => take [5..10],
    _ => default
};

Going all-out with the LINQ-spiration, you could even add keywords like any or all, which could be beneficial for potential dictionary pattern-matching paradigms in the future:

var item = list switch
{
    // "any" and "all" after the collection and before "is"
    [0..10 and from 30..50] any is 0 => "One of the items is 0",
    [0..10 and from 30..50] all is 0 => "All of the items are 0",

    // "any" and "all" before the collection
    any [0..10 and from 30..50] is 0 => "One of the items is 0",
    all [0..10 and from 30..50] is 0 => "All of the items are 0",

    _ => "default"
};

Nailing the syntax of this feature is a huge challenge for sure. @mpawelski had a great point about symmetry further up in the thread, but the concerns about too many dots in a spread operator are also very legitimate when we're playing with the range operator. I'm not sure if introducing more contextual keywords is the most desirable direction for this feature, but I personally find it a lot less difficult to read and understand keywords quickly vs too many symbols in our patterns.

I hope some of this post was helpful. Keep up the good work!

Elfocrash commented 2 years ago

I've been playing around with the proposal and I noticed that when the slice pattern is used, a new array is allocated for that slice via the RuntimeHelpers.GetSubArray method. I assume this is done because if an ArraySegment (or a Span) was offered from the original array, the mutation would cause potential unintented behavior.

I don't have data to back this up, but at least in my usecases for the feature, I would almost always use the array data in a read only fashion to produce another result instead of mutating the incoming slice. I understand that from a language design standpoint, your decision to allocate the new array makes sense, however would you think that it's possible to allow the user to define when slice can be a segment of the original array so we don't have to allocate the new slice? Maybe when .. ArraySegment<int> is used instead of .. int[] in the slice pattern?

alrz commented 2 years ago

I've been playing around with the proposal and I noticed that when the slice pattern is used, a new array is allocated for that slice via the RuntimeHelpers.GetSubArray

It's because it's an established pattern in dotnet/runtime to return the same type as containing type for Slice. You could use AsSpan before matching a sliced list against an array.

feO2x commented 2 years ago

@alrz Yeah, but the issue is the allocation of a new array. To use this feature in performance-sensitive scenarios, matching into a Span<T> or ReadOnlySpan<T> (or even the memory equivalents) would be great. As @Elfocrash mentioned, ArraySegment might also be an option, but I'm pretty sure that Span<T> and ReadOnlySpan<T> are more optimized.

HaloFour commented 2 years ago

@feO2x

To use this feature in performance-sensitive scenarios, matching into a Span<T> or ReadOnlySpan<T> (or even the memory equivalents) would be great.

You're in luck, those all work, at least per the latest bits in the feature branch:

SharpLab

using System;

int[] array = new int[] { 1, 2, 3, 4 };
if (array is [ 1, .. int[] arraySlice, 4 ]) {
}

Span<int> span = array.AsSpan();
if (span is [ 1, .. Span<int> spanSlice, 4 ]) {
}

ReadOnlySpan<int> ros = span;
if (ros is [ 1, .. ReadOnlySpan<int> rosSlice, 4 ]) {
}

I'm being explicit with the type of the slice, but that is also the type if you were to use var instead.

ArraySegment<T> also works, but Memory<T> doesn't:

error CS0021: Cannot apply indexing with [] to an expression of type 'Memory<int>'

emperador-ming commented 2 years ago

This is awesome. It would be great and more succinct if 'var' keyword were optional inside brackets.

int Add(List<int> list) list switch 
{
    [] => 0, 
    [head, ..tail] => head + Add(tail) 
};

HaloFour commented 2 years ago

@emperador-ming

That's a normal subpattern, it's not special for list patterns. C# always requires var or the type name for variable patterns and that is intentional.

emperador-ming commented 2 years ago

@emperador-ming

That's a normal subpattern, it's not special for list patterns. C# always requires var or the type name for variable patterns and that is intentional.

Quite verbose.

CyrusNajmabadi commented 2 years ago

Quite verbose.

Your code already has meaning :) head may be a constant which means the list that starts with that constant value.

theunrepentantgeek commented 2 years ago

Quite verbose

Very often opposite of verbose is not concise, but cryptic.

Nobody wants things to be more verbose than needed, but trying to go further than that just results in obfuscated Code that's needlessly difficult to read and maintain.

FaustVX commented 2 years ago

@emperador-ming

It would be great and more succinct if 'var' keyword were optional inside brackets.

The var keyword (or typename) is essential in C#, var (or typename) before an identifier means we declare a new variable. Without it, no one can know a new variable is declared

emperador-ming commented 2 years ago

Being list-patterns a typical functional paradigm I'm just proposing what is common in functional languages. I fail to see in what manner, in this precise context, making var optional obfuscates code.

CyrusNajmabadi commented 2 years ago

I fail to see in what manner, in this precise context, making var optional obfuscates code.

@emperador-ming As i mentioned above, that code already has meaning. It means "match if the list contains this value as the first element in it". Thanks! :)

dotnet / csharplang

Proposal: list pattern #3435