dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.04k stars 4.03k forks source link

Support pattern-matching in let clauses in Linq #6877

Closed gafter closed 7 years ago

gafter commented 8 years ago

Given the let-statement proposed in the draft pattern-matching specification for #206, it has been proposed by @alrz to extend the let clause in Linq to support pattern-matching as well:

because of the similarities with let statement I think that would be reasonable to support patterns in it.

from person in list
let Student { Grade is var grade, Name is var name } = person
where grade > 65
select new { Grade = grade, Name = name };
...
// equivalent to
...
from person in list
let $x = person switch(
  case Student { Grade is var grade, Name is var name }: new {grade, name},
  case *: null
)
where $x?.grade > 65        
select new { Grade = $x.grade, Name = $x.name }   
...
// complete patterns
...
from tuple in list
let (var x, var y) = tuple
where x > y
select x + y
...
// equivalent to
...
from tuple in list
let $x = tuple case (var x, var y) : new {x,y}
where $x.x > $x.y
select $x.x + $x.y

etc.

It isn't clear how this would be expected to work when the pattern is fallible.

mattwar commented 8 years ago

You could make it also where as a where clause and omit anything that doesn't match.

HaloFour commented 8 years ago

@gafter

I agree, letting let do double-duty here seems intuitive. But I also agree that failed patterns pose a potential problem. Having the failed patterns omitted automatically seems a little too implicit. Maybe support something like else continue?

from person in list
let Student { Grade is var grade, Name is var name } = person else continue
where grade > 65
select new { Grade = grade, Name = name };
gafter commented 8 years ago

@mattwar I agree, but which Linq method would it translate into? .Where doesn't have a mechanism for adding new variables, and .Let doesn't have a mechanism for filtering.

alrz commented 8 years ago

@HaloFour that's exactly the issue I was thinking about, it implicitly skips failed patterns. but else continue seems fair.

HaloFour commented 8 years ago

With let already providing a bit of syntactic voodoo in LINQ I think that it's reasonable that it would be translated out into multiple extension method calls.

Going with implicit filtering:

from person in list
let Student { Grade is var grade, Name is var name } = person
where grade > 65
select new { Grade = grade, Name = name };

converted into:

list
    // beginning of let expansion
    .Select(person => person switch (
        case Student { Grade is var grade, Name is var Name } :
            (true, person, grade, name)
        case * :
            (false, person, default(int), default(string))
    ))
    .Where(tuple => tuple.Item1)
    .Select(tuple => new { person = tuple.Item2, grade = tuple.Item3, name = tuple.Item4 })
    // end of let expansion
    .Where(projected => projected.grade > 65)
    .Select(projected => new { Grade = projected.grade, Name = projected.name });
alrz commented 8 years ago

I'd say it could be Let:

list
.Let(person => person switch( ... ))
.Where($x => $x.grade > 65)
.Select($x => new { Grade = $x.grade, Name = $x.name });

public static IEnumerable<TResult> Let<T, TResult>(
  this IEnumerable<T> source,
  Func<T, TResult> selector)
{
  foreach(var item in source) {
    var result = selector(item);
    if (result != null) yield return result;
  }
}
gafter commented 8 years ago

@alrz What does result != null mean when you don't know if it is a reference type or not?

alrz commented 8 years ago

@gafter It will be an anonymous type anyway, I forgot to put the class constraint.

gafter commented 8 years ago

@alrz No, it is not required to be an anonymous type. We were hoping to start using tuples in many cases for translating let to reduce GC pressure.

HaloFour commented 8 years ago

@gafter Great point. The compiler could spit out a private static function that accepts the original range variable (or a tuple of all current range variables) and returns an IEnumerable<tuple> of the original range variable(s) plus the variable patterns, then yield return a single result on a match.

Going with implicit filtering:

from person in list
let Student { Grade is var grade, Name is var name } = person
where grade > 65
select new { Grade = grade, Name = name };

converted (somewhat) into:

IEnumerable<(Person, int, string)> $match(Person person) {
    switch (person) {
        case Student { Grade is var grade, Name is var Name }:
            yield return (person, grade, name);
        default: yield break;
    };
};

list
    .SelectMany($match)
    .Where(projected => projected.Item2 > 65)
    .Select(projected => new { Grade = projected.Item2, Name = projected.Item3 });
alrz commented 8 years ago

@gafter I was assuming the current implementation, I did mention this in the tuple issue, too. By the way, that's good to know that it _is_ going to use tuples. :+1:

gafter commented 8 years ago

@HaloFour The Linq pattern is not tied to IEnumerable... it would be a shame if this pattern-matching support worked for IEnumerable but no other types.

HaloFour commented 8 years ago

@gafter Apologies for botching the example. I didn't imply that it would be tied to IEnumerable, aside the fact that let would use it to return either 0 or 1 results per attempted match, which I thought was what you were going for.

HaloFour commented 8 years ago

@gafter Wait, my example was correct. I'm not sure where the confusion is?

bbarry commented 8 years ago
from person in list
let Student { Grade is var grade, Name is var name } = person
where grade > 65
select new { Grade = grade, Name = name };

into

list
    .SelectMany(person => person switch (
        case Student { Grade is var grade, Name is var name } : 
            new [] { (person, grade, name) },
        case * : 
            Enumerable.Empty<...>()
    ))
    .Where(projected => projected.Item2 > 65)
    .Select(projected => new { Grade = projected.Item2, Name = projected.Item3 });

?

Is it possible to avoid the allocation for that array?

HaloFour commented 8 years ago

@bbarry

The compiler wouldn't be forced to impose the restriction of closures not being iterators on itself. I assume that the compiler would emit a private static iterator function and then pass a delegate to that function to SelectMany.

Your example using arrays is much clearer than mine in illustrating the concept, I think.

orthoxerox commented 8 years ago

What about this approach?

var students =
from person in people
where person is Student student
select student;
//is transformed into
var students = people
.Select(person => (person is Student student)?(true, student):(false, default(Student)))
.Where(tuple => tuple.Item1 = true)
.Select(tuple => tuple.Item2);
HaloFour commented 8 years ago

@orthoxerox Looks pretty similar to my first suggestion for an expansion, although you're using is with a ternary rather than the switch expression.

gafter commented 8 years ago

Why are all you folks tying this to Enumerable (e.g. using Enumerable.Empty or having the compiler generate a method that performs yield return)? The Linq feature can be used with any type that fits the API pattern (e.g. PLINQ, Linq to SQL, IObservable, etc). If we have this construct force the result to be Enumerable, then we've broken one of the important features of Linq.

gafter commented 8 years ago

@HaloFour Your first proposed expansion (also @orthoxerox 's most recent proposed expansion) would work.

bbarry commented 8 years ago

@gafter I thought you said SelectMany above; I was trying to figure out how that would have worked. I think @HaloFour's tuple expansion https://github.com/dotnet/roslyn/issues/6877#issuecomment-158172883 is better. It seems odd to have let reduce the number of items in the statement...

gafter commented 8 years ago

@bbarry I thought SelectMany could be used, but I tried and now I don't think so.

HaloFour commented 8 years ago

@gafter Indeed, implementing this via SelectMany would lock it to IEnumerable<T> under the hood as the mechanism to return zero or one value. It was your idea, we were following it. If this tie to IEnumerable<T> is unacceptable then that's fine, but you forgot to mention that before chastising us for chasing your concept.

Anyway, that is all implementation details. I'd be more concerned with hammering down the semantics and syntax (and variations thereof) at this stage and worrying about how exactly that could be expanded later. Particularly of interest would be any capability of this feature functioning as a filter as well as a projection, either by default or through additional syntax.

Aside that, it is nice to hear that LINQ may take advantage of tuples for let projections.

alrz commented 8 years ago

@HaloFour SelectMany is not tied to IEnumerable<T>, implement it for your desired type and it will work

static M<U> SelectMany<T, U>(this M<T> m, Func<T, M<U>> k)

No need to mention, it's the bind operator.

gafter commented 8 years ago

@alrz How would you use SelectMany to implement a pattern-based let clause, i.e. so that it can return one result or none, without tying it to IEnumerable?

alrz commented 8 years ago

@gafter you mean tying it to something else like maybe? because there should be something to represent one result or none.

gafter commented 8 years ago

@alrz I don't have a particular solution in mind.

alrz commented 8 years ago

@gafter Indeed, it can be implemented using SelectMany :smile: Here's a particular solution,

public static M<U> SelectMany<T, U>(this M<T> source, Func<T, U?> k)
    where U : struct
{
    return from item in source
           let result = k(item)
           where result.HasValue
           select result.Value;
}

public static M<V> SelectMany<T, U, V>(this M<T> m, Func<T, U?> k, Func<T, U, V> s)
    where U : struct
    where V : struct
{
    return m.SelectMany(x => k(x).SelectMany(y => new V?(s(x, y))));
}

public static U? SelectMany<T, U>(this T? m, Func<T, U?> k)
    where T : struct
    where U : struct
{
    return m.HasValue ? k(m.Value) : default(U?);
}

I simulated Maybe a with Nullable<T> for demonstration, so it can only be used with value types (see below), substitute it with Option<T> and it works for all types.

I'm using higher-kinded generic type M<T> you can substitute it with any type that has implemented SelectMany like IEnumerable<T> and it works for that particular type.

As an example,

// assuming
var list = new object[] { 1, 2.0, 3 };

// the following
var q = from item in list
        let double d = item
        select d;

// translates into
var q = from item in list
        from d in item is double num ? new double?(num) : default(double?)
        select d;

The return value of pattern matching can be an Option<T> where T is a tuple or anonymous type, to be used in the consequent LINQ operators.

The thing is, in async sequences, we have three container types to combine: Task<T>, IEnumerable<T> and Option<T> which complicates things exponentially! In that case, I think higher-kinded generics would be really helpful.

gafter commented 8 years ago

@alrz None of your SelectMany methods are part of the existing Linq pattern described by the C# language specification. I meant using the same SelectMany methods that must already be implemented for the current Linq pattern - for example, the ones in Enumerable. No fair mixing monads.

alrz commented 8 years ago

@gafter Well, it cannot be done I guess! They are not new implementations, I just implemented bind for Nullable<T> (third method) and a transformer for it (first method) and a helper function to be used in query expressions (second method). These are part of functional languages like Haskell and they are well integrated in the language, so you don't need to implement them yourself. The fact that Nullable<T> is not introduced as a monad in the language, restrict us in this kind of situations. Anywho, I'm really interested to see how this can be done using the existing SelectMany without tying it to a particular type.

gafter commented 8 years ago

@alrz I'm guessing it cannot be done.

HaloFour commented 8 years ago

@alrz All you're really doing is implementing the same Select/Where/Select implementation from above. The C# compiler can already emit tuples (or anonymous types) to project an analog to Option<T> so this would require no new BCL types or methods.

Also, to note, that second Select wouldn't be necessary as the compiler can simply choose to ignore the fact that the success flag is still projected. If the LINQ query performs another let match or select projection it could omit it at that point.

alrz commented 8 years ago

@HaloFour This is just functional way of doing things (as if they were existed in the language for Option<T> etc) so they wouldn't be special just for implementing let in query expressions. If Select/Where is the prefered way, I do agree that it's more optimized and straightforward solution for this. In that case I think expanding where ... is ... would be more sensible than translating let to a where clause (in case of a fallible pattern). So let would be used for complete patterns, and where ... is ... otherwise.

HaloFour commented 8 years ago

@alrz

My samples assume an implicit filtering of failed patterns, something that I don't know has been decided upon. If failed patterns wouldn't be filtered out then there would be no call to Where.

Well where ... is ... already means something today and doesn't convey the capture of those range variables. If you're arguing that declaration expressions or pattern variables should be lifted to range variables that might be a different proposal.

alrz commented 8 years ago

Well where ... is ... already means something today and doesn't convey the capture of those range variables.

Yes it does, as for is itself. But there weren't any "variables" for is before. Don't forget we have two kind of is operator,

relational-expression:relational-expression is typerelational-expression is complex-pattern

So, I'm saying that let can be used for complete patterns like

from item in list
let (var x, var y) = item
...

And the second form of is for fallible patterns.

from item in list
where item is Foo foo
...

And for the existing form of is it would behave as it always has.

HaloFour commented 8 years ago

@alrz

Pattern variables don't leak out of scope with is, though. Nor would they with where. That where clause could similarly combine other Boolean operations, including with declaration expressions, where it was determined that the scope would not exceed the where clause.

alrz commented 8 years ago

Now that you mentioned declaration expressions it's understandable for me that changing scoping just for pattern variables doesn't make sense. Since let clause variable scope is well defined I think that's the only option and for implicit skipping issue, as I said, else continue seems fair, as you suggested, but I'm not sure about using continue because it's already a statement and might be confusing.

gafter commented 8 years ago

Remember that the semantics of query expressions are defined mostly in terms of translation to invocation of methods with expression lambdas. Pattern variables will not leak out of expression lambdas. If we want to change the scoping we need to define a different translation.

alrz commented 8 years ago

@gafter

semantics of query expressions are defined mostly in terms of translation to invocation of methods with expression lambdas.

I presume that mostly means "except for let". I did take my suggestion back regarding expanding where. But still, I think it's not that illogical to have this behavior with let clause (without any change in scoping) — btw, I'm sure with introducing let statement for deconstruction, this one is expected as well. At least, it could just support complete patterns for deconstructing tuples (which I believe will be more commonly used than anonymous types in LINQ).

gafter commented 8 years ago

It isn't let in particular that is the exception. There are a few things that make it not quite a syntactic transformation:

  1. The query methods are "expected to have particular signatures and result types, as described in §7.16.3"
  2. "Assignment to range variables is not allowed in query expressions. However a C# implementation is permitted to not always enforce this restriction, since this may sometimes not be possible with the syntactic translation scheme presented here." ... Roslyn always enforces this.
  3. Some translation steps (like let and a non-initial select) involve the use of "transparent identifiers" which are described to be translated into the use of anonymous types. However, "An implementation of C# is permitted to use a different mechanism than anonymous types to group together multiple range variables." ... This gives us the flexibility to use tuples.
paulomorgado commented 8 years ago

I got a bit confused around this.

Part of this discussion is about deconstruction in LINQ queries giving a new meaning to the let keyword and match it with the "new" let keyword outside of LINQ queries, right?

How would this behave outside of LINQ?

let Student { Grade is var grade, Name is var name } = person;

Should it behave differently in LINQ queries? I think not.

The other of this discussion is about a condition introducing new "variables", right?

In order for the compiler to handle this without the need for introducing new signatures in Where, how about this?

where person is Student { Grade is var grade, Name is var name }

would be logically translated to:

let isMatch = person is Student { Grade is var grade, Name is var name }
where isMatch

or concretely to:

.Select(person => person is Student { Grade is var grade, Name is var name }
    ? (true, person, grade, name)
    : (false, person, default(Grade), default(Name)))
.Where(tuple => tuple.Item1)
.Select(tuple => (tuple.Item2, tuple.Item3, tuple.Item3))

The last Select is just to drop the condition value, but it could be kept until the very end and be reused in other similar patterns.

HaloFour commented 8 years ago

@paulomorgado The problem remains the scoping. If the pattern introduces new variables through variable patterns then in normal circumstances those variables would be out of scope as soon as the clause terminated. That would remain true using is within a let clause unless that behavior was defined. If that behavior would be reconsidered then this issue does go away:

from person in list
where person is Student { Name is var name, Grade is var grade } && grade > 65
select new { name, grade };

This is basically a continuation of the conversation at CodePlex except with patterns instead of variable declarations.

gafter commented 8 years ago

@paulomorgado The expression person is Student { Grade is var grade, Name is var name } is already an ordinary boolean expression that could be used in a lambda (or will be with the addition of pattern-matching), so I think we would be reluctant to give it a special meaning in a query's where clause.

paulomorgado commented 8 years ago

@HaloFour, doesn't the way I showed introduce those variables?

@gafter, the difference is the variables introduced. And I was not proposing to give a new meaning to the where clause. Was I?

HaloFour commented 8 years ago

@paulomorgado No, not without other changes to the semantics of the let clause. The only additional range variable it would introduce is isMatch. The others would end up going out of scope.

gafter commented 8 years ago

@paulomorgado Yes, you were proposing a change from the current specification for the translation of the where clause.

where person is Student { Grade is var grade, Name is var name }

would be logically translated to:

let isMatch = person is Student { Grade is var grade, Name is var name }
where isMatch

or concretely to:

.Select(person => person is Student { Grade is var grade, Name is var name }
    ? (true, person, grade, name)
    : (false, person, default(Grade), default(Name)))
.Where(tuple => tuple.Item1)
.Select(tuple => (tuple.Item2, tuple.Item3, tuple.Item3))

I say this is a change because the syntax after where is an expression, and the current spec already says what to do for where expression.

paulomorgado commented 8 years ago

OK. It's a "change", not a "CHANGE".

@gafter, on hindsight, that last Select is kind of stupid. But, apart from that and the needed introduction of the need two variables (grade, name), what's the difference from translating this:

from i in Enumerable.Range(0, 10)
let c = i % 2 == 0
where c
select I

into this (now using the tuple proposed syntax)?

Enumerable.Range(0, 10)
    .Select(i => (i, i % 2 == 0))
    .Where(((i, c)) => c)
    .Select(((i, c)) => i)
gafter commented 8 years ago

@paulomorgado the translation above is using precisely the set of methods specified in the language specification, and introduces precisely the set of range variables specified in the language specification, so it appears to comply with the spec (which allows an implementation to use other techniques for representing transparent identifiers).

Incidentally, the last two lambdas are syntax errors as written. I think it needs to be this:

Enumerable.Range(0, 10)
    .Select(i => (i, i % 2 == 0))
    .Where(t => t.Item2)
    .Select(t => t.Item1)
paulomorgado commented 8 years ago

@gafter, I was trying to be "tuply".

alrz commented 8 years ago

@paulomorgado It probably would be

Enumerable.Range(0, 10)
    .Select(i => (i, i % 2 == 0))
    .Where(t => t case (*, var c): c)
    .Select(t => t case (var i, *): i)

// or

Enumerable.Range(0, 10)
    .Select(i => (i, c: i % 2 == 0))
    .Where(t => t.c)
    .Select(t => t.i)

Since #6067 is not under consideration.