Extend `match` syntax to enable safe type narrowing

godotengine / godot-proposals

Godot Improvement Proposals (GIPs)

MIT License

1.16k stars 97 forks source link

Extend `match` syntax to enable safe type narrowing #6941

Open ator-dev opened 1 year ago

ator-dev commented 1 year ago

Describe the project you are working on

GDScript

Describe the problem or limitation you are having in your project

In the current implementation of GDScript, there is a limitation when it comes to type narrowing. Developers are unable to treat a variable as a more specific type within certain code blocks, such as if statements or match expressions. This limitation can lead to verbose and less readable code.

The safety of a type check is highly contextual. In segment A, foo += 1 would seem to be what it is: an entirely safe operation. However, something as simple as adding a function call in B might violate that assumption. Perhaps it makes foo a string, meaning that the operation will no longer succeed. C transparently violates the assumption, while D is more nebulous; during the 1 second wait period, anything could modify foo to be something else. The point being that it is not sensible to try to infer type safety based on an earlier check. Adding special cases such as assuming type safety directly after the check (E) is a workaround, but one that significantly increases the complexity of the analyser and is arbitrary.

See "If this enhancement will not be used often ..." for workaround details.

var foo = 2
if foo is int:
    print(foo + 1)

var foo = 2
if foo is int:
    bar()
    print(foo + 1)

var foo = 2
if foo is int:
    foo = ""
    print(foo + 1)

var foo = 2

func _ready():
    if foo is int:
        await get_tree().create_timer(1).timeout
        print(foo + 1)

var foo = 2
if foo is int:
    var foo_int := foo
    print(foo_int + 1)

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Discussed on Rocket.Chat with several contributors. Hopefully the discussion will be enhanced here.

We can extend the already exceptional match syntax to complete type checking.

Case syntax should be introduced where:

The expression value is checked against an explicit type.
If the equivalent check using as would be successful (i.e. not return null for nullable types), the case block is executed.
If a variable is optionally (discuss) explicitly introduced, it is statically typed as the type checked against; therefore, we can guarantee full type safety. And theoretically:
Multiple types can be checked against at once, as in the "A", "B", "C": case syntax already allowed. We could use commas in the same way, or pipes (|) to conform with common type theory syntax. This might amount to pretending we have a feature which we don't (yet), however, and couldn't be combined with an explicit variable since it would not be type safe in current GDScript.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

Option A

match foo:
    as String:
        # We only know that foo was a String at time of entry, and move on.
        pass
    var bar as String:
        # We know that bar is, and will always be, a String. We can use it with type safety.
        pass

Option B

match foo:
    is String:
        # We only know that foo was a String at time of entry, and move on.
        pass
    is String as var bar:
        # We know that bar is, and will always be, a String. We can use it with type safety.
        pass

If used currently in GDScript with proposed syntax (option A):

If this enhancement will not be used often, can it be worked around with a few lines of script?

Due to the demonstrated type safety limitations with type checking, in order to enforce type safety, developers must manually cast variables a lot of the time. The resulting code is less maintainable and readable, due to being more verbose and specifying types manually; essentially a violation of DRY (don't repeat yourself).

var foo = 2
if foo is int:
    var foo_int := foo as int
    print(foo_int + 1)

var foo = "string"
var foo_str := foo as String
if foo_str != null: # This is awful and assumes nullable types
    print(foo_str + " which is a string")

Is there a reason why this should be core and not an add-on in the asset library?

This is core GDScript.

SlugFiller commented 1 year ago

This is also useful with respect to godotengine/godot#76843, because narrowing a nullable type (e.g. int?) to its non-nullable equivalent (e.g. int) is also a form of narrowing for which all the above considerations apply.

joao-pedro-braz commented 1 year ago

I feel like each option has a different strength:

option B has the is String, which IMO is a really clear and intuitive way of type-checking the matched expression:
```
match foo:
  is String:
      pass # foo is a String, at least initially!
```
but option A has the var bar as String, which IMO does a better job at conveying the idea that "bar" is a (shallow) copy of the matched expression with a hard type:
```
match foo:
  var bar as String:
      pass # bar is a String and nothing else!
```

So... why not introduce both? Type-checking pattern with is String and Type-narrowing on an ad-hoc variable with var bar as String.

nlupugla commented 1 year ago

Edit: I would actually like option B much more if we just left out the is. Is that token needed for the parser?

Looking over the options again, I think is String doesn't read as well as var bar as String.

The match syntax in GDScript is

match expression:
    pattern:
        # block

One of the nice things about the available match patterns in GDScript is that for the purposes of understanding code, they can all be read aloud as essentially

if expression == pattern:
    # block

Even patterns like [42, ..] can still be read aloud as expression == [42, ..], even if it isn't technically valid GDScript. The existing var new_var pattern is the biggest exception in my opinion because the phrase expression == var new_var doesn't really make much sense when read aloud. Given that it is already a part of the language, it probably makes the most sense to read it as expression == <anything, and I'll call it "new_var">.

With that in mind, I read option A's

match expression:
    var new_var as String:

as if expression == <any string, and I'll call it "new_var">: On the other hand, option B's

match expression:
    is String

doesn't read the same way because if expression == is String doesn't make any sense.

ator-dev commented 1 year ago

I am now in favour of option B (with a modification), after reading the comments left here and giving the issue much more consideration. I initially advocated for option A in the belief that it would be more consistent, as it aligns more with the way we declare variables.

As shown below, a syntax which requires prepending an arbitrary variable name confuses the overall semantics, and makes it difficult to see at a glance what is being checked against. I am including potential future extensions of the syntax. In introducing this significant new feature, it is important that we consider how future adaptations would interact with it.

match foo:
    as int:
        pass
    var vector3 as Vector3: # Difficult to compare; offset; add/remove var name from the start
        pass
    as Node2D:
        pass
    var some_node as MeshInstance3D | CollisionShape3D | PhysicsBody3D: # Possible future syntax
        pass
    var node as Node3D:
        pass
    as Control:
        pass
    var vector2 as Vector2:
        pass
    var str = "string which is long and easy to mistype": # Possible future syntax
        pass # Messy. We are now comparing values, but it is difficult to tell the difference.
    "a string":
        pass
    var attitude = "hostile", "neutral", "friendly": # Possible future syntax
        pass
    "A", "B":
        pass

The below variation, in comparison, is beautiful and readable - in the mode of GDScript.

match foo:
    is int:
        pass
    is Vector3 var vector3:
        pass
    is Node2D:
        pass
    is MeshInstance3D | CollisionShape3D | PhysicsBody3D var some_node: # Possible future syntax
        pass
    is Node3D var node:
        pass
    is Control:
        pass
    is Vector2:
        pass
    "string which is long and easy to mistype" var str: # Possible future syntax
        pass
    "a string":
        pass
    "hostile", "neutral", "friendly" var attitude: # Possible future syntax
        pass
    "A", "B":
        pass

Crucially, the case statements begin with the indicator of the checking operation. Instead of having the analyser/interpreter look past a var ... every time just to see what type of equality check is used (and determine what is valid), the type of statement can be inferred just from what it starts with, making it easy to parse for machine and human alike. Further, when a line has been partially typed or is being edited, it is much more likely that the analyser can infer the intent.

match foo:
    var bar█ # Is it a type check? A value check? A fallthrough? Who knows...

match foo:
    var b█is Vector2: # The user goes to the line start to add/remove a var, confusing the analyser

The fact is, the var ... is incidental to the case statement. It has no effect on the execution condition for the case block, yet by putting it at the start, we imply greater importance and compel users to type it first. Much easier to trivially add and remove var ...s at the end of the lines, having no effect on the "look and feel" of the match. This way we also do not encourage users to "align" their variable name lengths for clarity.

With is ..., we always place the operative part at the beginning.

You may notice that I omitted as before var <name>. Is there any situation in which we would place some other keyword before var? Besides, as already has a meaning which is distinct from variable assignment. I think that var alone should be sufficient, and is less verbose.

@nlupugla

if expression == is String doesn't make any sense

You could also view each case as having an implicit operator at the beginning. This makes is the natural choice.

match foo:
    == "a string": # Implied by lack of anything else
        pass
    is String: # Explicit
        pass
    dict_matches { "key": "value" }: # Implied by {
        pass
    array_matches [ "item", .. ]: # Implied by [
        pass

nlupugla commented 1 year ago

You could also view each case as having an implicit operator at the beginning.

Agreed. I like the way you wrote out the operator implied by the string, dict, and array example above.

This makes is the natural choice.

You lost me here. If all other match patterns start with an implied operator, why should this new pattern be the only one that starts with an explicit operator? That's essentially my complaint about starting with is.

ator-dev commented 1 year ago

It's about existing language semantics. All of the other cases use an implied equality/match operator (==, dict_matches, array_matches). as is not an equality operator, it is a function that coerces the type of a value into a narrower type. I believe is is a better fit, because it is an equality operator, just one that works on type.

nlupugla commented 1 year ago

It's about existing language semantics. All of the other cases use an implied equality/match operator (==, dict_matches, array_matches). as is not an equality operator, it is a function that coerces the type of a value into a narrower type. I believe is is a better fit, because it is an equality operator, just one that works on type.

If the choice is between starting the line with is versus as, then I definitely agree it makes more sense to start with is. I'm arguing that to be consistent with other match patterns, the line shouldn't start with any kind of operator. That being said, our conversation on RocketChat is beginning to convince me that starting the line with is is important due to concerns with readability and variable shadowing.

nlupugla commented 1 year ago

I'd like to bring up an alternative that @SlugFiller, @ator-dev, and I discussed recently on RocketChat.

Instead of extendingmatch to enable safe type narrowing, another approach would be to introduce an expression that both returns a boolean and binds a variable.

For example, consider a new expression of the form

x is var new_var : Type

This expression returns the result of x is Type and creates a new local variable of type Type. If x is Type == true, then new var is initialized exactly as if it had been declared using var new_var : Type = x in the same block. Otherwise, it is declared as if the user had written var new_var : Type in the same block. As code, it would be roughly equivalent to

var new_var : Type
if x is Type:
    new_var = x
x is Type

Here's an example of how this expression could enable safe type narrowing.

func _input(event: InputEvent) -> void:
    if event is var event_mouse : InputEventMouse :
        pass # react to mouse using "event_mouse" local variable
    else if event is var event_key : InputEventKey :
        pass # react to keyboard using "event_key" local variable
    else:
        pass # react to any other event using "event" parameter

Alternatively, using match:

func _input(event: InputEvent) -> void:
    match event:
        var event_mouse : InputEventMouse : # implied expression is "event is var event_mouse : InputEventMouse"
            pass # react to mouse using "event_mouse" local variable
        var event_key : InputEventKey : # implied expression is "event is var event_key : InputEventKey"
            pass # react to keyboard using "event_key" local variable
        var event_other # the usual fall through pattern can now be understood as "event is var event_other : Variant"
            pass # react to any other event using "event_other" local variable

As you can see, this approach helps justify the current match fall through pattern, whose syntax I find somewhat funny compared to the other match patterns.

match x:
    var y: # Why is this the pattern for fall through? It can now be understood as "x is var y :  Variant", which is always true and always binds "x" to "y"

One important objection that was mentioned was the danger of relying on default/uninitialized values for the cases where x is var y : Type evaluates to false. This could potentially be worked around by allowing the user to supply a fallback value using syntax such as x is var y : Type ?? z, although it is admittedly clunky looking.

Specific syntax aside, the core idea is to solve explicit narrowing by introducing a new expression. I think this approach has a few merits:

We don't have to add extra functionality to if and match. The fewer weird edge cases for keywords the better.
As a standalone expression, it is easier to test. It doesn't need to be evaluated in a bunch of different contexts (except maybe for getting the local scoping of the new variable to work correctly)
The new expression could potentially be used to solve other problems that come up in the future.

ator-dev commented 1 year ago

I should have mentioned, this might be better in a separate proposal, even though it does relate to this one. It's quite a complicated topic that would take effect all over the language. I recommend copying it over if you want more specific discussion. (:

I will do my best at answering these points though.

We don't have to add extra functionality to if and match. The fewer weird edge cases for keywords the better.

The thing is, I don't think the proposed change to match is a weird edge case. It's a natural extension, just one that requires an additional keyword for disambiguation. As I mentioned earlier, there are already 3 types of matching: ==, dict_matches, and array_matches. By adding another one we allow the user to make even fuller use of the match construct; appropriately since in type theory, types extend to specific values, making type checking arguably an extension of value checking.

As a standalone expression, it is easier to test. It doesn't need to be evaluated in a bunch of different contexts (except maybe for getting the local scoping of the new variable to work correctly)

This needs a separate discussion I think. I'm not convinced, because I don't think we have any expressions that are anything more than a series of operations that equates to an in-place value. Scoping of this would be difficult.

The new expression could potentially be used to solve other problems that come up in the future.

Perhaps, but I also see value in introducing specific syntax to for, while, if, or whatever else needs it, just as with match. It may be versatile, but it does not look nice to read, perhaps due to being so generic.

Edit: Actually I acknowledge that the match cases look quite readable, despite having an additional : (which I am not sure about either - it would make interpreting more complicated). However, I am not sure how this would extend to allowing vars in more places, since you are saying that var at the start implies a type check. I view this as an arbitrary language decision to be wary of.

geekley commented 1 year ago

I tend to agree that an expression that atomically both returns a boolean and declares a variable is the way to go.

an additional : (which I am not sure about either - it would make interpreting more complicated)

I think it would be better to use as instead of : for the expression, like:

if event is var event_mouse as InputEventMouse:
  pass # event_mouse can be used here

Or better, copy C# syntax, which is clear and concise:

if event is InputEventMouse event_mouse:
  pass # event_mouse can be used here

geekley commented 1 year ago

One important objection that was mentioned was the danger of relying on default/uninitialized values for the cases where x is var y as Type evaluates to false

I believe you don't have to rely on uninitialized values, because you'd statically know that on that branch the variable would not "exist", and you should not be able to even reference it. For example, it's important to allow this pattern:

if not x is int i:
  # print(i) # would be a static compiler error, as i is undefined in this branch
  return null # useful when I KNOW Variant x is int and don't want more unnecessary nesting
my_int_fn(i)

That's how it works in C#, btw - you can even add compound logic with and and or on the if. I think (correct me if I'm wrong) you can always statically know which one of the 2 if/else branches will never have the new variable. For the other branch, I believe you can always know if will be initialized or if it might be initialized - and if it's not GUARANTEED, you should not allow referencing it at all either. This would force a better and clearer refactoring of the code. For example:

if a and x is int i:
  print(2*i) # i is ALWAYS initialized here
else:
  return # i is NEVER initialized here, so not accessible

if a or not x is int i:
  return # i is NEVER initialized here, so not accessible
# i is ALWAYS initialized here

if a or x is int i:
  print(2*i) # ⛔ should give compiler error because i is NOT guaranteed to be initialized here
  fn()
else:
  return # i is NEVER initialized here, so not accessible

# a working refactoring of the example above
if a or x is int: # no variable declaration
  if x is int i: # only introduce it separately
    print(2*i) # i is ALWAYS initialized here
  fn() # i is NOT accessible here, since it's completely out of scope
else:
  return # i is NOT accessible here, since it's completely out of scope

# alternative valid refactoring
if a or x is int: # no variable declaration
  print(2*i if x is int i else null) # type-safe since print expects Variant anyway
  fn() # i is NOT accessible here, since it's completely out of scope
else:
  return # i is NOT accessible here, since it's completely out of scope

dalexeev commented 1 year ago

See also #3733.

After godotengine/godot#80247 (#632), I remembered our discussion in Godot Contributors Chat and and the following syntax began to look more consistent to me (I just remembered if we have other declarations in which we cannot specify the type):

match foo:
    # Optional type specifier for match bind pattern.
    var bar: String:
        # ...Actions with bar...

Although in this case the type specifier is for filtering, it is different from for, which does not skip elements that do not match the type.

But actually I'm not sure if we should use match to narrow the types. In my opinion the following current solutions are acceptable:

var foo: Node
if foo is Node2D:
    var bar: Node2D = foo
    # Actions with `bar`...

var foo: Node
var bar := foo as Node2D
# If `foo` is not `Node2D`, then `bar` equals `null`.
if bar:
    # Actions with `bar`...

As for nullable types, in my opinion they are not much different from other types (Type? is just a supertype of Type). The only problem is that currently the as operator works inconsistently for Variant types, causing a runtime error instead of returning the default value of the given type. If we change this, then this will work:

var foo: Variant
var bar := foo as String?
# If `foo` is not `String?`, then `bar` equals `null` (default of `String?`).
if bar:
    # Actions with `bar`...

var foo: Variant
var bar := foo as String
# If `foo` is not `String`, then `bar` equals `""` (default of `String`).
if bar:
    # Actions with `bar`...

The disadvantage of these options is that you cannot specify a different default value in case of an error, or check if there was an error (in the case of as). But you can always use is:

var foo: Variant # Or `String?`. Doesn't matter, any `String` supertype.
if foo is String:
    var bar: String = foo
    # Actions with `bar`...
else:
    # Error processing...

var foo: Variant # Or `String?`.
var bar: String = "default"
if foo is String:
    bar = foo
# Actions with `bar`...

nlupugla commented 1 year ago

But actually I'm not sure if we should use match to narrow the types.

I don't like the idea of splitting type narrowing across two lines. I have several reasons for this, but perhaps the biggest one is that inserting a line in between would invalidate the typing.

For example:

if foo is Node2D:
    var bar : Node2D = foo

ensures that bar is a Node2D, but

if foo is Node2D:
    if randf() > 0.5 : foo = 0
    var bar : Node2D = foo

doesn't work half the time. In principle, I suppose one could statically analyze the intervening code to determine whether all paths result in a valid bar assignment, but seems overly complicated for what we are trying to achieve.

nlupugla commented 1 year ago

I think the solution for type narrowing should play nicely with match pattern guards if those are ever implemented. In fact, type narrowing can be seen as a special case of pattern guards. For example, if GDScript had pattern guards such as

match x:
    var y if x > 0:
        # code treating y as positive

then a natural way to do type narrowing would be

match x:
    var y : int if typeof(x) == TYPE_INT and x > 0:
        # code treating y as a positive int

Arguably, the typeof(x) == TYPE_INT could be inferred from the declaration var y : int, but I included it here to be explicit and show the relationship between type narrowing and pattern guards.