dotnet / vblang

The home for design of the Visual Basic .NET programming language and runtime library.
289 stars 65 forks source link

Pattern Matching #124

Open AnthonyDGreen opened 7 years ago

AnthonyDGreen commented 7 years ago

Some people have suggested to me that pattern matching doesn't feel like a VB feature. What they mean is "Pattern matching sounds like some advanced programming stuff". I disagree. I think pattern matching is very much so a VB feature. In fact, I suggest that most VB programmers, myself included, have spent most of their careers looking at data--be it DB records, or string input, or messages from one device or another--and testing the shape of that data against known patterns, separating structure from content, and extracting select content to fit our needs. That's what every use of the Like operator, the InStr function, and most If' and Select statements are really doing. They're just going about it in a rather tedious (imperative) way. Pattern Matching is a powerful feature we can add to VB to make the same kinds of tasks we've always been doing much more concise, readable, and declarative.

I propose we extend existing Case statements to support a new kind of clause, the Match or Matches clause (depending on your grammatical proclivities) of the form:

Case Match pattern [When expression]

Where pattern is one of:

The wildcard pattern * Matches all values, including null.

The variable pattern identifier [?] [As typename]

The function pattern expression ( [ pattern1 [, pattern2, ...] ] )

Example:

Select Case "1"
    Case Integer.TryParse(value) When value > 0
End Select

Order of execution in the above example is as follows:

The tuple pattern ( pattern1 [, pattern2, ...] )

However, the list doesn't end here. In fact, it's my belief that the tuple pattern is just the first example of a new trend in the language that with each new expression we add for constructing an object, there is a corresponding pattern for decomposing that object that mirrors the construction syntax. For example, one can imagine (and I have) patterns based on decomposing arrays based on the array literal syntax, decomposing strings based on the string interpolation syntax, and decomposing XML fragments based on the XML literal syntax. The JSON literal proposal (#101) also demonstrates this.

The entire Match (or Matches) clause succeeds if the specified pattern succeeds AND the expression provided in the When clause (if provided) evaluates as true. The expression may refer to variables introduced in pattern.

AdamSpeight2008 commented 7 years ago

Another possible option is to extend the usage cases of Like.

zspitz commented 6 years ago

Pattern matching is absolutely a VB.NET feature, especially considering it already exists in limited form as part of Select Case -- To. Is <comparison operator> x etc.

I would suggest that any pattern matching syntax should have the following features:


Proposed spec (using bits from the ANTLR-like grammer available from here):

PatternExpression
    : Pattern ('When' BooleanExpression)?
    ;

Pattern
    // patterns with subpatterns
    : Pattern ',' Pattern                    // OR pattern
    | '(' Pattern (',' Pattern)* ')'         // Tuple pattern
    | 'Not' Pattern                          // NOT pattern

    // patterns without subpatterns
    | '*'                                    // Matches any value, including Nothing
    | Literal                                // Matches a literal value
    | Identifier                             // Existing Identifier pattern -- when identifier already exists in scope; tests for value/reference equality
    | 'Of'? TypeName                         // Type check pattern -- when type name already exists in scope
    | Identifier ('As' TypeName)?            // Variable pattern; introduces a new variable in child scope
                                             // The type of the variable (if not specified) is the same as the subject of the pattern match
    | 'Is'? ComparisonOperator Expression    // Comparison pattern
    | 'Like' StringExpression                // Like pattern
    | Expression 'To' Expression             // Range pattern
    | Expression                             // Equality pattern -- value/reference equality test against Expression
    ;

// usage in Select Case` looks like this:
// replaces existing CaseStatement
CaseStatement
    : 'Case' PatternExpression StatementTerminator
      Block?
    ;

and theoretically, it could be used anywhere BooleanExpression is used; with the caveat that since the pattern is greedy, either a PatternExpression or a BooleanExpression can be used, but not both.

BooleanOrPatternExpression
    : BooleanExpression
    | PatternExpression
    ;

// If...Then..ElseIf blocks
BlockIfStatement
    : 'If' BooleanOrPatternExpression 'Then'? StatementTerminator
      Block?
      ElseIfStatement*
      ElseStatement?
      'End' 'If' StatementTerminator
    ;

ElseIfStatement
    : ElseIf BooleanOrPatternExpression 'Then'? StatementTerminator
      Block?
    ;

LineIfThenStatement
    : 'If' BooleanOrPatternExpression 'Then' Statements ( 'Else' Statements )? StatementTerminator
    ;

WhileStatement
    : 'While' BooleanOrPatternExpression StatementTerminator
      Block?
      'End' 'While' StatementTerminator
    ;

DoTopLoopStatement
    : 'Do' ( WhileOrUntil BooleanOrPatternExpression )? StatementTerminator
      Block?
      'Loop' StatementTerminator
    ;

// cannot introduce variables in to child scope here
DoBottomLoopStatement
    : 'Do' StatementTerminator
      Block?
      'Loop' WhileOrUntil BooleanOrPatternExpression StatementTerminator
    ;

ConditionalExpression
    : 'If' OpenParenthesis BooleanOrPatternExpression Comma Expression Comma Expression CloseParenthesis
    | 'If' OpenParenthesis Expression Comma Expression CloseParenthesis
    ;

Some additional points:

Related issues:

307 , #191 -- When expressions

304

24

160 -- Use of pattern matching syntax to define a BooleanExpression

23 -- Could be covered by a combination of OR pattern and type-check pattern

Pattern matching as a BooleanExpression would obviate the need for a dedicated type-checking syntax (#277 @bandleader) and provide inferred typing on introduced variables (#172).

Specific patterns: #141, #140, #139

Pinging @KathleenDollard @AdamSpeight2008 @bandleader @gafter @AnthonyDGreen @ericmutta

zspitz commented 6 years ago

An alternative spec for Pattern, where no pattern depends on context -- no need to check if there is an existing identifier or typename in the parent context:

Pattern
    // patterns with subpatterns
    : Pattern ',' Pattern                    // OR pattern
    | '(' Pattern (',' Pattern)* ')'         // Tuple pattern

    // patterns without subpatterns
    | '*'                                    // Matches any value, including Nothing
    | Literal                                // Matches a literal value
    | 'Of' TypeName                          // Type check pattern -- matches when subject is of TypeName
    | Identifier 'As' TypeName               // Variable pattern -- introduces a new variable in child scope
    | 'Is'? ComparisonOperator Expression    // Comparison pattern
    | 'Like' StringExpression                // Like pattern
    | Expression 'To' Expression             // Range pattern
    | Expression                             // Equality pattern -- value/reference equality test against Expression
    ;

Using this syntax would mean that subpatterns would also be forced to introduce new variables in the same way:

Dim o As Object
'...
Dim flag = o Matches (x As Integer, y As String) When x > 15 And y.Length < 7

If o Matches x As Random, y As Dictionary(Of String, Integer) Then
    If x IsNot Nothing Then Console.WriteLine($"Next number: {x.Next}")
    If y IsNot Nothing Then Console.WriteLine(y("Key") * 12)
End If
zspitz commented 6 years ago

What would the inverse of this look like?

If Not value Matches 0 To 3, 99 Then return "Valid" Else Return "Invalid"
' or
If value DoesNotMatch 0 To 3, 99 Then return "Valid" Else Return "Invalid"
' or
If value DoesntMatch 0 To 3, 99 Then return "Valid" Else Return "Invalid"
' or
If value IsNotMatch 0 To 3, 99 Then return "Valid" Else Return "Invalid"
zspitz commented 6 years ago

Instead of a new Matches keyword for boolean-returning pattern expressions, we could extend Is / IsNot for this purpose. Currently, Is looks like this:

IsExpression
    : Expression 'Is' LineTerminator? Expression
    | Expression 'IsNot' LineTerminator? Expression
    ;

and we could extend it to any pattern:

IsExpression
    : Expression 'Is' LineTerminator? PatternExpression
    | Expression 'IsNot' LineTerminator? PatternExpression
    ;
bandleader commented 6 years ago

@zspitz

  1. I assume you mean "and PatternExpression can of course be an Expression, as one of the supported patterns, so existing Is (reference) checks would work." But wouldn't that require that Expression be wrapped inside a PatternExpression in the AST? If so, I think the Roslyn folks won't want to do that, as it would break things like code analyzers. However, we could instead make it (PatternExpression OR Expression).

  2. Also, although I'm all for it, there was a some resistance from posters here to re-using Is. I'm not sure what they had against it -- maybe because that implies that Expression is a supported pattern, and they are worried about a case where there's a type and a variable with the same name? But if that's the opposition, that applies equally to Matches; the real question is whether Expression is a supported pattern.

zspitz commented 6 years ago

@bandleader

would break things like code analyzers.

Doesn't every spec change have the potential to similarly break existing code analyzers?

Also, are you proposing that Expression not be supported as a pattern? Because there are nested patterns, and just as this:

Dim rnd As New System.Random
If o Is rnd Then ...

is currently supported, so too this:

If o Is (rnd, 15) Then ...

should also be supported.

some resistance from posters here to re-using Is

My mental model for Is (or how Is should be WRT pattern matching) comes from the corresponding natural-language use of is.

I think that just as is in natural language isn't constrained to a particular form on the RHS, so too in VB.NET Is as an operator could be expanded to mean matches whatever is on the RHS. Conflicts between types and variable names in patterns could be resolved by an explicit keyword (along the lines of is Spot vs is a beagle).

bandleader commented 6 years ago

@zspitz I'm with you on supporting expressions (and I like Spot). I just seem to remember some opposition due to conflicts. If resolved via a keyword (do you mean something like Is OfType T?), then in a stalemate with no keyword, the expression would have to win out, so as not to break existing code -- something very important to the team at MS, even in very rare scenarios. Again, I think your proposal works well.

As far as each spec change breaking code analyzers -- no, it's only if the structure of the AST changes. We can add new keywords, or even new AST node types; we avoid, however, causing an existing piece of code (i.e. someVar Is someRef) to be reinterpreted with different AST node types/structure (i.e. pattern matching on an expression instead of reference equality), even if it compiles to the same effect, as it breaks code analyzers (who are looking for Expression there and suddenly see a PatternExpression). Again, we could probably solve this using OR.

zspitz commented 6 years ago

@bandleader My initial thought (described here) was to resolve conflicts based on context:

but I suspect that is too over-clever for VB.NET.

bandleader commented 6 years ago

@zspitz That should indeed solve the problem -- since 1 will fire before 2, it won't break code. (Though the first bullet point has to be done without wrapping the expression in a PatternExpression, so as not to break analyzers.)

But you're right that it might be "too clever for VB.NET" -- i.e. isn't really VB-like or VB-friendly -- doesn't really fit the VB requirements for "seeing the code is seeing what it does." I could see the VB team opposing it on two counts:

  1. Having a single syntax to match both on values and types decreases clarity -- especially since Is (and Case) currently match on values. Instead, they may prefer adding the keyword OfType for this, as in Is OfType T and Case OfType T.
  2. Similarly, having a single syntax for matching on a variable, and creating a variable. i.e. does Case x match on value x or create a variable x? Depends on context. Instead, it is probably more VB-like to use a keyword for this; as some have suggested, it could be Dim x. For example Case OfType IEnumerable Dim lst.

This way, the intent and behavior is evident from the pattern itself, and doesn't depend on context. (Also, when the intent is to match on the value of an existing variable, it prevents mistyping the variable name, and also allows us to provide Intellisense for the name.)

bandleader commented 6 years ago

As per my above comment, I do prefer your "context-free" suggestion (as described here) as fitting much better into VB, where it's important to us that intent be reflected in words, and "seeing the code is seeing what it does."

However, I would change the following:

zspitz commented 6 years ago

@bandleader

OfType might be a more explicit choice than Of, but Of already is used for this purpose in calls to generic methods and classes.

Another possibility for the type-checking pattern might be As <TypeName>, which would align very nicely with the type part of the variable pattern (MyIdentifier As <TypeName>).


Dim feels to me like an instruction to do something -- define a variable of a given type. I think this goes against the grain of pattern matching, where we're asking first and foremost does the LHS match the pattern expressed on the RHS? It's true that variable usage within the pattern -- within the pattern block, and within the When clause -- is an important part of pattern matching, but the first step is the match against the pattern. I think MyIdentifier As TypeName better expresses a pattern, as opposed to the use of Dim.

Might be related, but Dim only appears at the beginning of a line. I think that the variable introduced here is closer to the variable introduction in Sub/Function definitions, which doesn't use an explicit Dim keyword.

zspitz commented 6 years ago

@bandleader

Also, this way it combines much more naturally with the type-check pattern. Case T -> Case T Dim x.

This is only true if we're relying on context to differentiate between the expression pattern and the type-check pattern (if the single-identifier expression matches an identifier in context, then it's an expression pattern; otherwise a type-check pattern).

zspitz commented 6 years ago

@bandleader

also meshes well with existing TypeOf operator

Since IMHO the TypeOf ... Is ... syntax is horrible (#277), I don't feel beholden to prefer a syntax because it aligns well with TypeOf ... Is ....

JustNrik commented 6 years ago

I'd personally like pattern matching feature like this:

Dim obj As Object = "Test"

If TypeOf obj Is String str AndAlso str.Length > 0 Then
    ' Stuff
End If

' For select statement

Select TypeOf obj ' Or Select Case TypeOf obj
    Case Integer int
        ' Do stuff with int
    Case String str
        ' Do stuff with str
End Select

' This would be an equivalent of
Select Case True
    Case Integer.TryParse(obj, Out int)
    Case String.TryParse(obj, Out str)
End Select

' Or

Select Case True
    Case TypeOf obj Is Integer
        Dim int = DirectCast(obj, Integer)
    Case TypeOf obj Is String
        Dim str = DirectCast(obj, String)
End Select
zspitz commented 6 years ago

@JustNrik Your syntax limits pattern matching to type-check + cast-to-a-new-variable (which could be covered by #172, without introducing a new variable). It's important to realize that pattern matching in other languages is a far broader concept -- does the left-hand-side fit into the pattern described on the right-hand-side, with some parts of the pattern being named:

Select obj
    Case int As Integer
        ' Do stuff with int
    Case (int As Integer, s As String)
        ' Do stuff with int or s
End Select
bandleader commented 6 years ago

Dim feels to me like an instruction to do something -- define a variable of a given type. I think this goes against the grain of pattern matching, where we're asking first and foremost does the LHS match the pattern expressed on the RHS? I think MyIdentifier As TypeName better expresses a pattern, as opposed to the use of Dim. It's important to realize that pattern matching in other languages is a far broader concept -- does the left-hand-side fit into the pattern described on the right-hand-side, with some parts of the pattern being named:

@zspitz I hear you; you're viewing pattern matching how it's viewed in other languages (Scala etc.): "Does the LHS match the pattern on the RHS, where the RHS can also have parts which match all values (optionally limited by type) and just assign it to a variable." However, if that's the case maybe it shouldn't use Select Case, because Case x today expects x to be a value (or expression) which has to match the LHS, and not an identifier for a new variable that we'll assign to.

If what we're looking to do is extend Select Case, then we shouldn't change the meaning/context of Case x. And as I mentioned, I think even Case x As Integer doesn't look any less like matching based on existing x or any more like the assignment you propose it should be come. That's why I suggested Case Integer Dim x.

Doing Scala-style pattern matching in VB

That said, I have no problem with switching to the Scala idea of pattern matching, but in that case we should not re-use Select Case; maybe make a new keyword like Match, or even keep Select Case but use Matches or Case Matches instead of Case.

Also, somebody suggested that pattern matching should be implemented as an operator (expression-based) as opposed to a control statement that runs other statements. I personally would like this a lot. Reduces a lot of boilerplate in which you set (e.g.) message = in every single Case, instead of just returning the value you need. And of course, can be used within an existing expression much more easily.

Dim message = "You have " & emailCount Match(
    0: "no new email."
    1: "one new email."
    x As Integer When x > 1: $"{x} new emails."
    Else: "a broken mailbox."
) & " Click the icon for details."    

I love this, but I'm not sure how the average "bus-driver programmer" would take to it.

Versus the old way -- less powerful, and much less DRY -- not just longer, but also you have to change in 4 places if decide not to &= the value, but assign to a second variable, or Return it from your function, etc.:

Dim message = "You have "
Select Case emailCount
    Case 0: message &= "no new email."
    Case 1: message &= "one new email."
    Case Is > 1: message &= $"{x} new emails."
    'Testing for Integer type can't currently be done here at all.
    Case Else: message &= "a broken mailbox."
End Select
message &= "Click the icon for details."
zspitz commented 6 years ago

@bandleader

However, if that's the case maybe it shouldn't use Select Case, because Case x today expects x to be a value (or expression) which has to match the LHS, and not an identifier for a new variable that we'll assign to.

Only because in VB.NET there are currently no semantics for introducing a new variable in the first line of a construct. There's nothing similar to using C#'s out var in the start of a block:

if (int.TryParse('1234", out var x)) {
    // x is an int here
}

On second thought, there are actually two such constructs, and neither uses Dim -- Sub / Function definitions:

Sub Foo(x As Integer)
...
End Sub

and lambda expressions / statements:

Dim Bar = Sub(x As Integer)
           End Sub

From the discussion at #177 (and particularly here) it seems that Dim doesn't really do anything; it's nothing more than some kind of placeholder.

bandleader commented 6 years ago

Only because in VB.NET there are currently no semantics for introducing a new variable in the first line of a construct.

@zspitz No, I mean because existing behavior of Case x is to value-match on the value of x, as opposed to being a declaration+assignment to new variable x.

zspitz commented 6 years ago

@bandleader

However, if that's the case maybe it shouldn't use Select Case, because Case x today expects x to be a value (or expression) which has to match the LHS, and not an identifier for a new variable that we'll assign to.

If what we're looking to do is extend Select Case, then we shouldn't change the meaning/context of Case x. And as I mentioned, I think even Case x As Integer doesn't look any less like matching based on existing x or any more like the assignment you propose it should be come. That's why I suggested Case Integer Dim x. ... I mean because existing behavior of Case x is to value-match on the value of x

The existing behavior of the Case clause is already to match some pattern, and not necessarily on the value of x:

Dim x = 5, y = 10
Dim n = 5
Select Case n
    Case x
        Console.WriteLine("Exact match")
        Case x, y
                Console.WriteLine("Either, or")
    Case x To y
        Console.WriteLine("Between")
End Select

I don't think we need to enforce that every pattern not start like the simple value match of Case x because of usability purposes. The RHS of Case is some pattern, of which Case x: is a specific example. (The compiler will be able to differentiate between Case x and Case x As Integer by reading to the end of the pattern, as it (presumably) does today.)

So now the question becomes, what syntax is appropriate for a pattern that introduces a new variable in a child scope, We have two existing syntaxes for introducing identifiers:

  1. Dim Identifier As TypeName or Dim Identifier for local variables and fields
  2. Identifer As TypeName for procedure parameters, and parameters lambda expressions / statements

I think the semantics of a "variable-introducing pattern" are much closer to 2. Dim creates an identifier that is available for the rest of the parent scope, and is not already part of some other construct. The second option is precisely this: introduces a new variable for a child scope as part of an additional statement, one that is only available in said child scope.

gilfusion commented 6 years ago

Also, don't forget that Identifier As TypeName is also used in For, For Each, Using, and Catch, so there is more precedent for that.

On another note, just using Case pattern seems very brittle and not future-proof. In the Case x scenario, remember that x could be any valid expression. So, no pattern could have the same form as a valid expression without risking changing the meaning of existing code.

However, there is one syntax already part of Select Case that might help--what if we could use Case Is pattern, and just make the existing Case Is >… and friends part of pattern matching?

Select Case obj
    Case foo
    Case Nothing
    Case Is > 32
    Case Is s As String
    Case Is (> 0, y As Double) 'Tuple pattern with nested patterns
    Case Is bar When z > 0 'Declare bar with type inference
    Case Else
End Select
zspitz commented 6 years ago

@gilfusion I can't believe I forgot those.

On another note, just using Case pattern seems very brittle and not future-proof. In the Case x scenario, remember that x could be any valid expression. So, no pattern could have the same form as a valid expression without risking changing the meaning of existing code.

I'm OK with that limitation (actually I think this limitation is required -- see end of this post), and most of the patterns I've described here are actually not currently valid expressions, which means they can easily be differentiated from Case <expression>.

The one exception is the tuple pattern ('(' Pattern (',' Pattern)* ')'). However, for value tuples -- which are value types -- comparing equality of the LHS to a tuple expression is functionally equivalent to matching against a tuple pattern. By analogy, comparing against the value 5 is functionally equivalent to matching against the 5 pattern.

However, there is one syntax already part of Select Case that might help--what if we could use Case Is pattern, and just make the existing Case Is >… and friends part of pattern matching?

I'm hesitant to use Is here, because then we'd end up with three different rules for using Is within Case:

which I think would be very confusing.

Moreover, your suggestion of Is only takes care of the highest-level pattern, but patterns can also be nested. Let's imagine somewhere down the line we introduce a new pattern, say for objects which support default property access -- a(x). This pattern could be a valid expression if we have the identifiers a and x in context. The following is indeed unambiguous, because of your suggestion to use Is:

Dim a As New List(Of String)
Dim x As Integer
Select Case n
Case a(x)    ' Checks whether n is equivalent to the value at position x in the list
Case Is a(x) ' Checks whether n is an object with a default property
End Select

But what happens here?

Select Case n
Case Is (a(x)) ' Is this trying to match a 1-tuple with the value at position x?
               ' Or is it trying to match a 1-tuple with an object with a default property?
End Select

I think the only way around this is to force all patterns to be invalid as expressions (unless the pattern is either matching against a value type, or matching against an expression), thus differentiating them from patterns.

zspitz commented 5 years ago

@gilfusion I've already suggested an algorithm for differentiating between patterns and expressions, but @bandleader says -- and I agree -- that this is too confusing, and it is better to have all non-expression patterns be umambiguously non-expressions.

zspitz commented 5 years ago

What about a pattern that matches values/patterns against public properties or fields?

MemberPattern
    : '.' Identifer Equals Expression
    | '.' Identifier `Matches` Pattern
    ;

WithPattern
    : 'With {' MemberPattern (, MemberPattern)* '}'
    ;

Pattern
    // patterns with subpatterns
    : Pattern ',' Pattern                      // OR pattern
    | '(' Pattern (',' Pattern)* ')'           // Tuple pattern

    // patterns without subpatterns
    | '*'                                      // Matches any value, including Nothing
    | Literal                                  // Matches a literal value
    | 'Of' TypeName WithPattern?               // Type check pattern -- matches when subject is of TypeName
    | Identifier 'As' TypeName WithPattern?    // Variable pattern -- introduces a new variable in child scope
    | WithPattern                              // Checks public properties / fields for matching values
    | 'Is'? ComparisonOperator Expression      // Comparison pattern
    | 'Like' StringExpression                  // Like pattern
    | Expression 'To' Expression               // Range pattern
    | Expression                               // Equality pattern -- value/reference equality test against Expression
    ;

When used with the type-check pattern or the variable pattern, Intellisense on the available members could be provided.

But this could also be useful for objects whose shape is not known at compile time, such as deserialized JSON.

Dim o As Object
Select Case o
    Case p As Person With {.Age Matches 18 To 25}
    Case p1 As Person With {.LastName Matches Like "A*"}
    Case With {.FirstName Of String}, With {.BirthDate Of Date}  ' matches only if property exists and is of the appropriate type
End Select

Admittedly, this has some overlap with the When clause, but I can see a few differences:

ping @bandleader.

zspitz commented 5 years ago

For the typecheck-and-assign pattern, I'd use Case T Dim x. Reasoning: Your Case x As T looks like a modification of Case x which is a value match.

I want to note that after consideration (and some private discussion with bandleader) I agree with this.