Open asoffer opened 3 years ago
Whether or not a conversion exists, the pattern match will succeed. However if no such conversion exists, but an error will be emitted due to the missing conversion.
I do not understand this section; edit for clarity?
Hope that helps
To make this happen, we need change character literals to be prefixed with a different symbol. We propose ! for this symbol.
I don't love this
In c languages, the implicit conversion from char to bool (and vice versa) makes this syntax surprising.
While I do like symbols in Ruby and elixir, :
might be a better option for char literal
Looking at this
// Accepts a constant N, and partitions an array whose length is a multiple of N
// into an array of length-N arrays of the same elements.
partition ::= (N :: u64, x: ~[N * `M; `T]) -> [M, N; T] { ... }
Is it possible to build a pattern for chunk
, ie where the overflow goes into the last "part" of the array?
To make this happen, we need change character literals to be prefixed with a different symbol. We propose ! for this symbol.
I don't love this In c languages, the implicit conversion from char to bool (and vice versa) makes this syntax surprising. While I do like symbols in Ruby and elixir,
:
might be a better option for char literal
:
might be hard to parse because of its use as in declarations. Does it help that Icarus doesn't use !
for logical negation nor does it have that conversion? Like, it's not ideal, but at least you get decent error messages. For the first time I'm really starting to feel the sigil real estate issues.
Looking at this
// Accepts a constant N, and partitions an array whose length is a multiple of N // into an array of length-N arrays of the same elements. partition ::= (N :: u64, x: ~[N * `M; `T]) -> [M, N; T] { ... }
Is it possible to build a pattern for
chunk
, ie where the overflow goes into the last "part" of the array?
It should be, though it's either quite subtle or not achievable at all right now. If we allowed users to define their own patterns it would be quite attainable. Or if we allowed binding with further matching. For example, I could imagine also having `
take a left-hand side:
partition ::= (N :: u64, x ~[(N * `M + Length % N)`Length; `T] -> ...
In this case we would first bind Length
, then checking to ensure it satisfied the pattern, which would further end up binding M
.
Honestly, as I read further I was fairly well won over by !. It's probably untenable if you also wanted it as the not
operator, but given that it isn't I think I'm ok with your choices.
I might need a sigil map to help me understand what they all do. Maybe that's just a view of really structured documentation (eg all single character reserved words)
Can you build up multiple line patterns / bindings?
Can you build up multiple line patterns / bindings?
Am I correct that you're asking about composability? I think this requires making patterns first-class citizens (at compile-time) in the language. I'm not opposed to this, but I don't see an elegant way to do it right now. The problem is, you wouldn't want to do the naive thing:
even_length_array ::= [2 * `N; `T]
// Many lines later
[4; i64] ~ even_length_array
assert(N == 2) // Woah, where did `N` come from?
The problem is the binding variables are declared with the pattern, but actually bound when they're used with ~
.
You'd really want to do something that forces users to name the bindings at the usage cite. Something like:
even_length_array ::= pattern(half_length, element_type; [2 * `half_length, `element_type])
`
[4; i64] ` even_length_array(`N, `T)
The syntax here is really gross though. I don't have good idea, because it should look something like a function definition, but we need to pass variable bindings as parameters, which is just weird.
Compile-time Pattern Matching
Pattern matching is the ability to concisely describe the form that a value takes on in a declarative manner. In the context of generic programming, this comes up frequently. For example, we may want to restrict a function to only accept arrays, though we do not care about the length of the array or the type of the array elements.
We propose a new operator
~
, which allows matching expressions to patterns, provided that the matching can be executed entirely at compile-time. The left-hand side of this operator is the expression to be matched, and the right-hand side is the pattern we intend to match. If the expression does not match the pattern, an error is emitted.Defining and Using Patterns
The simplest pattern is a constant. Constants can be used to enforce compile-time invariants. For a contrived example,
More useful is a pattern to which we can bind further identifiers. To do this, we introduce a new declaration syntax using a backtick (
`
). Syntactically, the backtick accepts an identifier. It denotes a pattern which matches anything. A match also binds the identifer to the value being matched within the local scope.For example,
While pattern matching against types is a common use case, we allow patterns to match against any values.
in some contexts, we also allow
~
to be used as a unary operator. Specifically in contexts where inference is allowed, the unary~
will match the pattern against the value that would be inferred. The most common case would be with type-inference in declarations.In each of these examples, the pattern in the declaration's type-expression matches the type inferred from the initial value. This constrains
p
to be a pointer anda
to be an array of even length.Note that because
~
is a unary operator, it is syntactically valid to write something such asThe semantics of such an expression may seem strange at first.
~`T
still matches the type ofsomething()
, but then the type ofa
becomes[3; T]
, rather thanT
. Such a declaration will then require an implicit conversion fromT
(the type ofsomething()
) to[3; T]
(the type ofa
). Such a conversion does not exist forT
in general, but there may be a user-defined conversion for this specific type. Whether or not a conversion exists, the pattern match will succeed, because the pattern`T
matches everything. However the type ofa
is inferred to be[3; T]
and no conversion exists fromT
to that, so an error will be emitted due to this missing conversion.Function paremeters
Pattern matching in function parameters is no different syntactically or semantically from above but is ubiquitous for generic programming and worth calling out explicitly. Just as in the example with declarations, a pattern placed in a declarations type-expression matches against the type inferred for the declaration. In the case of function parameters, this is the type of the argument passed in at the call-site.
Here are a few examples:
Implicit conversion subtleties
There are some subtleties worth calling out. For instance, should the pattern
*`T
match[*]i64
? Should an array-type match a slice-type Should functions with parameter names match the corresponding nameless function type (e.g.,(n: i64) -> i64
versusi64 -> i64
)? The underlying question here is whether a type should match a pattern to which it can be implicitly converted. While this seems like a nice feature in general, it can be difficult or impossible to determine the right implicit conversion without knowing the type we want to convert to (just a pattern for that type). There may even be multiple such conversions. It therefore seems prudent to disallow conversions entirely. Because buffer-pointer-to-pointer is a conversion, the pattern*
Twill not match
[*]T`.However, this opens up a trap for Hyrum's law. Users of a library may match a library function's return value against a pointer such as
*`T
, thereby causing breakages when the library author decides to change the return type to a buffer pointer, which would otherwise be allowed. We still believe that not matching through implicit conversions is the right solution, but we would want toSyntactic conflicts
Both
`
and~
are currently currently used symbols. The symbol`
is used to denote a character literal, and the symbol~
is used to invert flags. We believe these symbols are more appropriate for pattern matching. Specifically, the tilde fits well because it is commonly used in contexts of similarity. It can be read as "looks like." The reason for preferring a backtick with declarations is a bit more tenuous. We want a symbol that does not have many pixels set so that the focus is on the identifier name. Moreover, we expect`
to (in later additions to the language) accept a left-hand argument further constraining the identifier. For this reason, the whitespace in the sigil`
makesa`b
easier to read than something else likea!b
.To make this happen, we need change character literals to be prefixed with a different symbol. We propose
!
for this symbol.Similarly, we need a replacement for flag inversion. For this we propose the keyword
not
. In fact, there's no reason to have&
,|
or^
when we haveand
,or
, andxor
as well.User-defined patterns
At the moment, we do not provide any mechanism allowing users to define their own patterns. This is a feature we are interested in, but is not crucial.
Run-time pattern matching.
It would be ideal to entirely support run-time pattern matching as well, however this comes with two primary difficulties.
If we can come up with an unobtrusive syntax for failure-handling and find that the performance costs are minimal and often optimizable away, we should revisit this decision for run-time pattern matching.
For the time being we support pattern matching against runtime values, but only if the pattern can be matched at compile-time. The identifier being declared must be the unary
~
operator applied to a pattern.For example, we allow array or struct expansion (a la C++ structured bindings) because the structure of the array is only dependent on the length which is a property of the type.
Arrays
Note: We may also want to add
...
as is done with designated initializers (see below).Designated initializers
For designated initializers, there is a question about what happens for fields that are not bound as part of the pattern. Neither solution is particularly nice. We could silently drop the fields, but then users are likely to miss necessary values and end up with incorrect structs. We could also require all fields to bound, but this also runs a risk with Hyrum's law, making it difficult for a library author to add new fields to a struct.
we propose the following solution: By default, all fields must be present for a match. However, users may also add a
...
marker inside the designated initializer to indicate that more values are being dropped. This allows library authors to add fields to structs, coupled with a tool that inserts...
appropriately.Also as a small amount of syntactic sugar, we allow omitting the type leading the designated initializer in favor of simply
.{ ... }
, as the type must be deduced anyway.