erlang / eep

Erlang Enhancement Proposals
http://www.erlang.org/erlang-enhancement-proposals/
264 stars 67 forks source link

Add EEP-0057: Implement alternative patterns for expression matching #27

Closed saleyn closed 2 years ago

saleyn commented 3 years ago

Extend the syntax to allow multiple matches of a case statement to share the same body.

essen commented 3 years ago

This should apply to function clauses as well.

ferd commented 3 years ago

Wouldn't | refer to a cons cell, and the logical 'or' token actually be ; as used in Guards and clauses? | seems like something borrowed from other languages' 'or' rather than something fitting the actual conventions around Erlang.

saleyn commented 3 years ago

@ferd, to me the pipe | syntax seems more intuitive, since using the it for the cons cell notation is usually combined with the list brackets ([ ... | T]), and using it without the brackets currently has no special meaning, but is well familiar to people coming from other languages. Moreover, it's consistent with the use of | in spec's for the same reason - providing alternative patters. Though I do agree with you that ; for delimiting clauses is more natural to Erlang. Here I don't advocate any specific choice of syntax, but rather would love to see this missing functionality in the language, which would make implementations more terse.

Also, it seems that using ; as a pattern delimiter would be ambiguous whether the {A,1} is the end of the pattern and that partial pattern is erroneous with a missing body, or whether it's part of the alternative pattern that follows:

case Expr of
  {A, 0} ->
    ok;
  {A, 1};     % <-- This is ambiguous, as ';' commonly indicates the "end" of the pattern/guard/body
  {A, 2} when is_integer(A) ->
    ok
end
saleyn commented 3 years ago

This should apply to function clauses as well.

Are the guard constraints not sufficient for functions? Could you give an example? One can argue that the joined functionality can be achieved in a case statement using guards as well, but to me it's less readable:

case compare(X, Y) of
  Res when Res == lt; Res == eq ->
    true;
  _ ->
    false
end.

vs

case compare(X, Y) of
  lt | eq ->  true;
  _       ->  false
end.
essen commented 3 years ago

It's just about as readable in function guards as it is in case guards.

The extreme example is https://github.com/ninenines/cowlib/blob/master/include/cow_parse.hrl where I've abstracted the ugly guards in a .hrl file. Ideally those would just be using |. I started writing an EEP a while back about it (using the same pipe) but never managed to complete it.

Common examples would be StateName = connecting | connected and the likes. Sure you can use guards but it can get messy when you have other things to test in guards. Because of the precedence you have to write StateName =:= connecting, OtherGuards; StateName =:= connected, OtherGuards.

saleyn commented 3 years ago

It's just about as readable in function guards as it is in case guards.

The extreme example is https://github.com/ninenines/cowlib/blob/master/include/cow_parse.hrl where I've abstracted the ugly guards in a .hrl file. Ideally those would just be using |. I started writing an EEP a while back about it (using the same pipe) but never managed to complete it.

Common examples would be StateName = connecting | connected and the likes. Sure you can use guards but it can get messy when you have other things to test in guards. Because of the precedence you have to write StateName =:= connecting, OtherGuards; StateName =:= connected, OtherGuards.

Though it may seem to be a similar requirement, perhaps addressing changes in guard syntax should be a separate EEP? Notably, the example you are referring to can be simplified by using a simple parse transform (e.g. gin), whereas the use of alternative patterns requires a modification of syntax.

lhoguin commented 3 years ago

I gave two examples, you ignored the second, which is very much the same as your examples, only in function clauses.

I would like to write handle_state(connecting | connected, ...) when OtherGuards. To make it more obvious, what you want:

case compare(X, Y) of
  lt | eq ->  true;
  _       ->  false
end.

What I want:

foo(X, Y) -> bar(compare(X, Y)).
bar(lt | eq) -> true;
bar(_) -> false.

I want this syntax to be extended to all pattern matches, not a select two. This also means that lt | eq = compare(X, Y) should be allowed. Based on previous discussions on this topic I would say that's something OTP team would prefer as well.

The first "extreme" example cannot be solved with simple parse transforms. What you see is not the full picture, it is merely those that could be (more or less) easily abstracted. But it runs into the same issues as I mentioned above with regard to precedence and would work out much better via the pipe operator.

zuiderkwast commented 3 years ago

@saleyn it's much more clear after the updates!

Agree with @lhoguin, if the pattern syntax is extended, it should be in all places where patterns are used, because patterns are patterns. That is: match expressions Pattern = Expression, case, receive, function clauses, fun, try-of, try-catch and the generators in list comprehensions and bit string comprehensions.

saleyn commented 3 years ago

I gave two examples, you ignored the second, which is very much the same as your examples, only in function clauses.

I would like to write handle_state(connecting | connected, ...) when OtherGuards. To make it more obvious, what you want:

case compare(X, Y) of
  lt | eq ->  true;
  _       ->  false
end.

What I want:

foo(X, Y) -> bar(compare(X, Y)).
bar(lt | eq) -> true;
bar(_) -> false.

I want this syntax to be extended to all pattern matches, not a select two. This also means that lt | eq = compare(X, Y) should be allowed. Based on previous discussions on this topic I would say that's something OTP team would prefer as well.

The first "extreme" example cannot be solved with simple parse transforms. What you see is not the full picture, it is merely those that could be (more or less) easily abstracted. But it runs into the same issues as I mentioned above with regard to precedence and would work out much better via the pipe operator.

@lhoguin, I see. I misunderstood your point thinking that you were referring to changes in guards. It does make sense, indeed.

saleyn commented 3 years ago

@saleyn it's much more clear after the updates!

Agree with @lhoguin, if the pattern syntax is extended, it should be in all places where patterns are used, because patterns are patterns. That is: match expressions Pattern = Expression, case, receive, function clauses, fun, try-of, try-catch and the generators in list comprehensions and bit string comprehensions.

@zuiderkwast, @lhoguin, do you agree that a restriction to have the same variables bound in all alternative patterns to be a reasonable constraint? I.e.:

{A, 1} | {A, 2} = some_fun()

vs

{A, 1} | {B, 2} = some_fun()

I wonder if such a generic pattern syntax extension you are suggesting would be able to get accepted, given that it would require a lot more work on test cases in the compiler pertaining to match expressions, case, receive, function clauses, fun, try-of, try-catch, generators, etc.

lhoguin commented 3 years ago

I think the restriction is reasonable. I haven't found an argument against yet.

I think most uses of | in pattern can be implemented as a normal case block. The compiler can easily rewrite lt | gt = compare(X, Y) into the case block equivalent before it continues, even if variables are involved. Not sure about the exception.

zuiderkwast commented 3 years ago

do you agree that a restriction to have the same variables bound in all alternative patterns to be a reasonable constraint

@saleyn Yes, very reasonable IMO. I wouldn't want to allow maybe-unbound variables.

I wonder if such a generic pattern syntax extension you are suggesting would be able to get accepted, given that it would require a lot more work

It's probably a lot of work yes. Do you think a suggestion where patterns behave differently in different kinds of expressions are more likely to get accepted?

Did you check the compiler source code?

Maybe the AST can be rewritten by the parser to just duplicate the patters containing the pipe, so case X of lt | gt -> ok end becomes case X of lt -> ok; gt -> ok end. For match expressions, it could rewrite lt | gt = some_fun() to something like case some_fun() of lt -> lt; gt -> gt end. Just an idea.

It would be interesting to hear from the OTP team, but maybe they'll wait until the EEP is submitted(?).

saleyn commented 3 years ago

Maybe the AST can be rewritten by the parser to just duplicate the patters containing the pipe, so case X of lt | gt -> ok end becomes case X of lt -> ok; gt -> ok end. For match expressions, it could rewrite lt | gt = some_fun() to something like case some_fun() of lt -> lt; gt -> gt end. Just an idea.

Yes, indeed. It should be pretty trivial to implement it this way, as the extension is merely a syntactic sugar for eliminating the redundant code, which the parser can replicate behind the scenes, and for function matches with alternative expressions (e.g. foo(lt | eq, a | b) -> true;) we'd have:

foo(lt, a) -> true;
foo(eq, a) -> true;
foo(lt, b) -> true;
foo(eq, b) -> true;

It's probably a lot of work yes. Do you think a suggestion where patterns behave differently in different kinds of expressions are more likely to get accepted?

It would be interesting to hear from the OTP team, but maybe they'll wait until the EEP is submitted(?).

@zuiderkwast, I too, would like to hear thoughts of the OTP team on this proposal. Though it seems that an iterative approach of getting the syntax extension in different kinds of expressions to have a greater likelihood to be accepted and released, a partial change to expressions might as well be considered a half-measure and get rejected. After all years back when we submitted the implementation draft of the SCTP protocol for the emulator, given the scope of changes I didn't think that that would ever be accepted, but it actually did.

Based on your last comment, is there another form of EEP submission, or a pull request here is sufficient?

@zuiderkwast, @lhoguin, I'll try to modify the EEP to add your suggestion of a more generic change to allow alternative expressions across the board, but it would likely take a few days.

RaimoNiskanen commented 3 years ago

I am not from the OTP team ;-) Well, I am, but this is not my home turf, so in this case I am practically not.

In my opinion this feature should not apply to patterns, or to all places where there are patterns, but rather to where there are multiple clauses with patterns such as; case clauses and function+fun clauses, and I can not think of others.

A guard with ; in these context is today conceptually expanded to multiple bodies (and then optimized to collapse the multiple bodies (if I remember correctly)), and as @zuiderkwast pointed out this feature could be just the same.

case foo() of
    X when X =:= a; X =:= b ->
        Body
end

becomes

case foo() of
    X when X =:= a ->
        Body;
    X when X =:= b ->
        Body
end

So, conceptually

case foo() of
    a | b ->
        Body
end

would become

case foo() of
    a ->
        Body;
    b ->
        Body
end

Meaning that if we introduce variables to make it more interesting:

case foo() of
    {a, X} | {b, X} ->
        {body, X}
end

would expand to (conceptually)

case foo() of
    {a, X} ->
        {body, X};
    {b, X} ->
        {body, X}
end

So, if you would use variables that are not bound in all alternative patterns, you would get, just as you would today; a variable unbound when used in a body, or variable unsafe after case if used after the case statement.

When we get this kind of improvement suggestions they often come with a reference implementation in the form of parser modifications or a parse transform just to prove that the operator in question works as expected syntactically and semantically.

I think, since this suggestion sounds familiar, it has most certainly been suggested before, and deemed to hard or impossible to implement, so I think something like that is needed to know if this suggestion is feasible. And I think it has to be tested if | actually works, or if ; works better, and deduced why the selected one is better than the other(s) (there might be others). The | operator has the clear disadvantage that it is allowed within patterns so it gets a different meaning on the outer level than inside pattern alternatives.

case foo() of
    [{a, A} | B]|{A,B} ->
        {ouch, A, B}
end

If alternative case patterns can not be seen as this kind of simple clause duplication, it has to be proven or motivated why, and explained what alternative is better. The EEP, I think, also will have to explain why the alternative case patterns should be working as selected, i.e should they work on all patterns, on clauses, on Comprehensions (more on that later), on some other construct, etc...

I guess I am saying that it would be nice / necessary to have some kind of prototype to be convinced that alternative case (+function +fun) patterns is something that works in the language.

Regarding Comprehensions, @zuiderkwast mentioned them:

[X || {a,X} | {b,X} <- L]

Here I see no clear rewrite of what this would mean in current Erlang. It is something like:

[case Pattern of
     {a, X} | {b, X} ->
         X;
     _ ->
         error({case_clause, X})
 end  || Pattern <-L]

but you can not trick the Comprehension by throwing a case_clause that it was no match.

Nevertheless, I think the EEP would have to work through the Comprehensions too, and decide if and/or how this should be dealt with.

My 2c

HansN commented 3 years ago

First, I have "always" missed the possibility to have multiple clauses with one body in case statements.

Second, just a thought without any deep analysis with the proposed ';' or '|' symbol. Overloading a very common symbol with a new meaning could be problematic and hard to read IMO.

Therefore, if one mean:

case foo() of
    {a, X} ->
        {body, X};
    {b, X} ->
        {body, X}
end

I think it is much clearer and intuitive with just allowing several clauses in place of one clause:

case foo() of
    {a, X} ->
    {b, X} ->
        {body, X}
end

instead of:

case foo() of
    {a, X} | {b, X} ->
        {body, X}
end

It would also make cons patterns with alternatives more comprehensible:

case foo() of
    [{a, A} | B]|{A,B} ->
        {ouch, A, B}
end

becomes:

case foo() of
    [{a, A} | B]->
    {A, B} ->
        {not_ouch, A, B}
end

My 0.2 ¢ as private person (not the OTP opinion)

saleyn commented 3 years ago

I think it is much clearer and intuitive with just allowing several clauses in place of one clause:

case foo() of
    {a, X} ->
    {b, X} ->
        {body, X}
end

instead of:

case foo() of
    {a, X} | {b, X} ->
        {body, X}
end

@HansN, using your suggested approach what would you recommend as the alternative syntax for the following cases, so that conceptually they would not be significantly different from the alternative matches in the case, receive, try-of expressions?

lt | eq = compare(A, B)

foo(lt | eq) -> true

Or in your opinion, the EEP shouldn't mix the later with the former, and not address the alternative matches in functions?

HansN commented 3 years ago

In my opinion matching a LeftValue with a RightValue ( the = operator ) is a completly different thing than the clause syntax in case, receive and try statements.

Today we have something like

<stmt> ::= case  <expression> of <clause-list> end
              |  receive <clause-list> [after <integer>-> <stmt-list> ] end
              .....
              |  <pattern> = <expression>

<clause-list> ::= <clause> [<clause-list]
<clause> ::= <clause-head> <stmt-list>
<clause-head> ::= <pattern> [<guard>] -> 

So something like

lt | eq = compare(A, B)

or

case foo() of
    {a, X} | {b, X} ->
        {body, X}
end

is a change of <pattern> to <pattern-list>, while my suggestion is a change of the <clause-head> syntax to

<clause-head> ::= <pattern> [<guard>] -> [<clause-head>] 

And, in my opinion, one role of the case statement is to allow more than one <pattern> to match an <expression>.

Regarding alternatives in functions, my opinion is that the alternatives should be on the level of function clauses, analogous to case clauses.

If you change the pattern to take alternatives ( like a|b = f() ), should that apply also to "sub" patterns (like {a, b|c} = f())? That could give ambiguities like:

[a,b|c] = g()
HansN commented 3 years ago

@RaimoNiskanen

>  Regarding Comprehensions, @zuiderkwast mentioned them:
> 
> [X || {a,X} | {b,X} <- L]

> Here I see no clear rewrite of what this would mean in current Erlang. It is something like: 
> [case Pattern of
>      {a, X} | {b, X} ->
>          X;
>      _ ->
>          error({case_clause, X})
>  end  || Pattern <-L]
>
> but you can not trick the Comprehension by throwing a case_clause that it was no match.

I would re-write it in current Erlang as

[X ||  {Tag,X} <- L,
        Tag == a ; Tag == b
]
zuiderkwast commented 3 years ago

First, I have "always" missed the possibility to have multiple clauses with one body in case statements.

Very nice to get the individual attention from OTP members. :-) It's a good sign.

@HansN Wouldn't P1 -> P2 -> Body be hard to parse?

If you change the pattern to take alternatives (like a|b = f()), should that apply also to "sub" patterns (like {a, b|c} = f())? That could give ambiguities like:

Yes, I assumed that {a|b, A} would be allowed and equivalent to {a, A} | {b, A}. If that's not the case, then we're not really extending the pattern syntax but just the case/receive/etc. expressions.

The pattern [A | B] is indeed ambiguous now. It's either alternative patterns in a singleton list or a cons. We'd have to make it cons to keep backwards compatibility, i.e. give precedence to cons. Or use semicolon or something else.

A different but related topic: In Haskell, it's possible to repeat a clause with different guards without repeating the pattern. Here's an example in pseudo-Erlang syntax:

case X of
    {A, B} when A  > B -> gt;
           when A  < B -> lt;
           when true   -> eq
end
saleyn commented 3 years ago

The pattern [A | B] is indeed ambiguous now. It's either alternative patterns in a singleton list or a cons. We'd have to make it cons to keep backwards compatibility, i.e. give precedence to cons. Or use semicolon or something else.

@zuiderkwast, one thought against using the semicolon would be that matches of A; B = foo() would have a different meaning from A|B = foo(), in which A and B = foo() would become separate expressions.

It seems that using | with giving a precedence to cons would be one way to dealing with the issue, or possibly to modify the syntax to encapsulate the alternative matches in parenthesis to disambiguate the expressions: [(1|2)] = foo(). Hence, this would be a valid syntax: [(1|2) | T] = foo(), where T is the cons, and (1|2) is the alternative match.

essen commented 3 years ago

If both ; and | create problems, let's pick something else, even 2 characters if necessary, it shouldn't be seeing that much use (unlike binaries...). Perhaps |= or <>.

About lt | gt = compare(X, Y) the good thing about this is that you can avoid some case expressions. For example:

Z = case compare(X, Y) of
    {one, Z0} -> Z0;
    {another, Z0, _Ignore} -> Z0
end

becomes

{one, Z} | {another, Z, _} = compare(X, Y)

But it can be left aside for now if it causes problems.

saleyn commented 3 years ago

I guess I am saying that it would be nice / necessary to have some kind of prototype to be convinced that alternative case (+function +fun) patterns is something that works in the language.

@RaimoNiskanen, a working prototype would certainly be quite helpful. Though given my limited availability, my intent was to kick off the discussion through this EEP as I've been craving for this feature in Erlang for two decades. Maybe someone on the list could help with implementing the prototype?

zuiderkwast commented 3 years ago

Even if it will be unusual in code, both | and ; would be intuitive for a reader, but anything like |= or <> would be like WAT??? (no offence). Previews:

foo({a, X} <> {X, b}) -> X.     % Wat???
bar({a, X} |= {X, b}) -> X.     % Wat???

I like @saleyn's idea about parentheses for disambiguation, e.g. [(a|b)]. The pipe is especially nice considering it's already used in the types and specs for the same thing, so we get this symmetry:

-spec f(a|b) -> ok.
f(a|b) -> ok.

Regarding comprehensions, [X || {a, X} | {b, X} <- L] can be rewritten as @HansN did, but it doesn't work for all patterns. We'd have to do something like @RaimoNiskanen did, but to get an empty generator instead of an exception, maybe we can rewrite it to something like this?:

[Y || Y <- [case Pattern of
                {a, X} | {b, X} ->
                    [X];
                _ ->
                    []
            end || Pattern <- L]]
saleyn commented 3 years ago

Edited the EEP to reflect the reviews mentioned here.

saleyn commented 3 years ago

Regarding Comprehensions, @zuiderkwast mentioned them:

[X || {a,X} | {b,X} <- L]

Here I see no clear rewrite of what this would mean in current Erlang. It is something like:

[case Pattern of
     {a, X} | {b, X} ->
         X;
     _ ->
         error({case_clause, X})
 end  || Pattern <-L]

@RaimoNiskanen, @zuiderkwast, @HansN: presently mismatched terms in comprehension patterns are filtered out rather than failing with an exception, so, possibly doing something like this might be more "in spirit of" the current functionality of comprehensions for the syntax shown in the example above:

[case I of 
{a, X} | {b, X} -> X
end
|| I <- L,
case I of
{a, _} | {b, _} -> true;
_               -> false
end
]
bjorng commented 3 years ago

I am from the OTP team and I've done some work in the compiler. ;-)

I like the general idea of alternate patterns at the top-level and I think that| is a reasonable separator. I am not that sure about alternate patterns nested within a pattern.

First let's look at how alternate patterns at the top-level can be implemented. The first compiler pass to be considered is the parser (erl_parse). Introducing new syntax could always lead to conflicts, necessitating rewriting of rules or ugly tricks. In this case, I think that using | as separator at the top-level would not cause any major problems.

The next pass is erl_lint. I don't foresee any major difficulties here.

We then come to translation to Core Erlang in v3_core. Expanding alternate patterns separated by | to multiple clauses does not really work, because there could be a combinatorial explosion:

foobar(a|b|c, d|e|f, g|h|i, x|y|z) -> . . .

Instead, the matching would have to be rewritten similar to this (in Erlang as opposed to Core Erlang and simplified):

foobar(A, B, C, D) ->
    case A of
        _ when A =:= a; A =:= b; A =:= c ->
            case B of
                _ when B =:= d; B =:= e; B =:= f ->
                    . . .
            end
    end.

The groundwork for breaking apart the matching of a pattern into multiple nested matchings was done in https://github.com/erlang/otp/pull/2521, so it seems that implementing alternate patterns could be done with a reasonable amount of work. It might also be necessary to update Dialyzer to ensure that it fully understands and analyses the code translated in this way (that could actually be more work than the work in the compiler).

Another complication is translation of patterns with different shapes (such as {ok,Res} | {error,Res,_}). At the moment, I don't know exactly how to handle them. It might be necessary to duplicate the body (which would be optimized later for simplish bodies).

bjorng commented 3 years ago

I am not sure that nested alternate patterns (such as {ok|error, Info}) is a good idea, especially not if the | is the separator. I think that if they should be supported, I would want to see more motivation and use cases where they would be useful.

Regarding implementation, there is more potential for problems in the parser, but because the requirement of using parentheses around the alternate patterns, it might still be possible to implement. In the translation to Core Erlang, there could be additional complications for nested alternated patterns with different shapes (such as {{a,b}|{x,y,z}}).

bjorng commented 3 years ago

@ferd, to me the pipe | syntax seems more intuitive, since using the it for the cons cell notation is usually combined with the list brackets ([ ... | T]), and using it without the brackets currently has no special meaning, but is well familiar to people coming from other languages. Moreover, it's consistent with the use of | in spec's for the same reason - providing alternative patters. Though I do agree with you that ; for delimiting clauses is more natural to Erlang. Here I don't advocate any specific choice of syntax, but rather would love to see this missing functionality in the language, which would make implementations more terse.

Also, it seems that using ; as a pattern delimiter would be ambiguous whether the {A,1} is the end of the pattern and that partial pattern is erroneous with a missing body, or whether it's part of the alternative pattern that follows:

case Expr of
  {A, 0} ->
    ok;
  {A, 1};     % <-- This is ambiguous, as ';' commonly indicates the "end" of the pattern/guard/body
  {A, 2} when is_integer(A) ->
    ok
end

This explanation of why ; is not used as the separator should be incorporated into the EEP itself.

RaimoNiskanen commented 3 years ago

@zuiderkwast wrote

I like @saleyn's idea about parentheses for disambiguation, e.g. [(a|b)]. The pipe is especially nice considering it's already used in the types and specs for the same thing, so we get this symmetry:


-spec f(a|b) -> ok.
f(a|b) -> ok.
I still think that overloading the operator semantics is confusing.  When it is right within `()` or at the outer level it means alternative patterns, but when it is right within `[]` it means `cons`.

In the type specification syntax there is no overloading, because that syntax ignores the cons operator since it does not handle heterogenous lists, improper lists nor lists with order, so in the type specification language the | operator only means alternatives.

@saleyn

Yes, I know that throwing an exception to rewrite

[X || {a,X} | {b,X} <- L]

does not work, which was what I was trying to point out. If there were a "nomatch" exception that could be thrown from comprehension matching code, than that would work. But comprehensions simply lay out code that does not match.

Here is another possible rewrite:

[X || {X} <-
    [case Item of
         {a,X0} | {b,X0} ->
             {X0}; % Set of matched variables
         _ ->
             nomatch
     end || Item <- L]]

and this rewrite pattern could be extended to any size of matched variables tuple {X0, ...}.

One thing this EEP would have to investigate, though, is how a rewrite would interact with multiple filters and generators.

@HansN wrote

I would re-write it in current Erlang as

[X ||  {Tag,X} <- L,
        Tag == a ; Tag == b
]
Yes, but that breaks down for general patterns:
```erlang
case foo() of
{a, X} | {a, b, X} ->
X
end

A new question

How should this feature handle overlapping patterns?

foo({a, X} | {X, b}) ->
    X.

What would foo({a, b}) return?

I know the first choice would probably to select left to right matching order and therefore return b, but then we have introduced an order within alternative patterns when we do not have any order between arguments.

And that would mean that:

foo(X, {a, X} | {X, b}) ->
    X.

called as foo(a, {a, b}) would succeed and return a if arguments were evaluated left to right, but fail with a function clause if arguments were evaluated right to left, since the alternative pattern, would bind X to b.

The EEP would have to investigate this.

lhoguin commented 3 years ago

Overlapping patterns could be rejected by the compiler. I don't think we want an order defined for the reasons you mentioned, and because we can just create a separate clause if needed.

Overlapping patterns are hopefully not too common because they would make for poor interfaces to begin with (although I suppose they might be found in some data structure implementations).

I am not sure that nested alternate patterns (such as {ok|error, Info}) is a good idea, especially not if the | is the separator. I think that if they should be supported, I would want to see more motivation and use cases where they would be useful.

Top-level only would be a good start but it would be great to support it also in records. To give one random example:

precondition_is_head_get(Req, State=#state{method=Method})
        when Method =:= <<"HEAD">>; Method =:= <<"GET">> ->
    not_modified(Req, State);
precondition_is_head_get(Req, State) ->
    precondition_failed(Req, State).

could become:

precondition_is_head_get(Req, State=#state{method= <<"HEAD">> | <<"GET">>}) ->
    not_modified(Req, State);
precondition_is_head_get(Req, State) ->
    precondition_failed(Req, State).

And if we support it in records, then why not maps, etc.

HansN commented 3 years ago

A topic for the EEP to define is what @bjorng and @RaimoNiskanen already touched but not explicitly:

If the pattern is a tuple or a list with alternatives in more than one element, in which order are they to be investigated? In clauses (function, case, receive, try) they are, as we all know, tested in order of appearance in the source code:

case E of
    {a, c}  -> ...;
    {a, d} -> ...;
...
end

If we match a single pattern, e.g. case E of {a|b, c|d} -> ... end, the order of expansion must be defined in the EEP. Preferable it should be the same in {a|b, c|d} = E.

The notation

case
    P1 ->
    P2 ->
    ...
    Pn -> S
end

where Pi are today's Erlang patterns, avoids this as well as the combinatorial explosion risk identified by @bjorng .

zuiderkwast commented 3 years ago

@bjorng

Expanding alternate patterns separated by | to multiple clauses does not really work, because there could be a combinatorial explosion

How about simply limiting the explosion to 5 pipes, 32 expanded clauses? It can be enough in practice. (Error "too many patterns".)

I am not sure that nested alternate patterns (such as {ok|error, Info}) is a good idea, especially not if the | is the separator. I think that if they should be supported, I would want to see more motivation and use cases where they would be useful.

@lhoguin's first example (two days ago) was StateName = connecting | connected, i.e. binding the alt. value to a variable.

In ssl_cipher there are some examples like this one:

mac_hash({3, N} = Version, MacAlg, MacSecret, SeqNo, Type, Length, Fragment)  
  when N =:= 1; N =:= 2; N =:= 3; N =:= 4 ->

@RaimoNiskanen

How should this feature handle overlapping patterns?

If it's documented as being syntactic sugar for multiple clauses expanded left-to-right, the semantics can be the same as for multiple clauses, i.e. first match served. (Or reject overlapping patterns, sure, if it can be done.)

bjorng commented 3 years ago

@zuiderkwast

@bjorng

Expanding alternate patterns separated by | to multiple clauses does not really work, because there could be a combinatorial explosion

How about simply limiting the explosion to 5 pipes, 32 expanded clauses? It can be enough in practice. (Error "too many patterns".)

No, that would violate the Zero one infinity rule.

What I meant in my comment was that the implementation could not be done by simple expansion. We should allow any number of | separators, and the implementation should be sufficiently sophisticated to handle it.

saleyn commented 3 years ago

EEP updated to reflect the reviews and points addressed in this discussion.

RaimoNiskanen commented 3 years ago

The EEP does not talk about the implications of overlapping patterns on argument evaluation order. See my example above.

foo(X, {a, X} | {X, b}) ->
    X.

I fear this is a non-solvable problem which implies that @HansN's suggestion is the feasible one.

Indirectly I think that also implies that the a | b = foo() variant is not doable, which the EEP currently brushes away with saying that a -> b -> c = foo() creates more confusion instead of recognizing the possibility to simply not support that construct.

RaimoNiskanen commented 3 years ago

This EEP currently suggests::

a | b = foo()

Today already we have the possibility

case foo() of
    X when X =:= a; X =:= b -> ok
end

that with the other extensions of this EEP can be written as:

case foo() of
    a | b -> ok
end

Note that the first variant will probably produce a Dialyzer warning and need to be modified to:

_ = (a | b = foo())

which the case construct does not need. The case construct also allows for an arbitrary body.

I guess what I am saying is that I think not having the a | b = foo() construct is not a big loss, so the EEP should not try so hard to keep it. I think, like @HansN, that it is a different animal that is not essential to have.

saleyn commented 3 years ago

The EEP does not talk about the implications of overlapping patterns on argument evaluation order. See my example above.

foo(X, {a, X} | {X, b}) ->
    X.

I fear this is a non-solvable problem which implies that @HansN's suggestion is the feasible one.

@RaimoNiskanen, wouldn't the left-to-right order of evaluation also solve this problem? After all, if the syntax above is equivalent to:

foo(X, {a, X}) -> X;
foo(X, {X, b}) -> X.

then, the function clauses are evaluated top-down until there's a match.

lhoguin commented 3 years ago

This EEP currently suggests::

a | b = foo()

Today already we have the possibility

case foo() of
    X when X =:= a; X =:= b -> ok
end

that with the other extensions of this EEP can be written as:

case foo() of
    a | b -> ok
end

This is why I am fine with dropping it if it proves to be a problem. But if not, by all means include it.

Note that the first variant will probably produce a Dialyzer warning and need to be modified to:

_ = (a | b = foo())

which the case construct does not need.

Dialyzer doesn't produce warnings when you match on the value explicitly (assert), it only does when you ignore the returned value. Here it would match on either a or b.

RaimoNiskanen commented 3 years ago
foo(X, {a, X}) -> X;
foo(X, {X, b}) -> X.

then, the function clauses are evaluated top-down until there's a match.

Yes, and there it is obvious.

But in a function head it is explicitly stated that the evaluation order is undefined, and the alternative patterns would have to have a defined order, or disallow overlap (which does not happen for other clause sequences). Then we have an order between alternative patterns but not between arguments with alternative patterns, which is confusing.

RaimoNiskanen commented 3 years ago

@lhoguin wrote:

Dialyzer doesn't produce warnings when you match on the value explicitly (assert), it only does when you ignore the returned value. Here it would match on either a or b.

Oh yes, you are right. I got confused that the match in itself returned a or b, and there you have a value that is ignored. So a rewrite to

case foo() of
    a -> a;
    b -> b
end

would be a statement that ignores the return value, meaning that this simple rewrite does not work. The compiler would have to be smarter.

RaimoNiskanen commented 3 years ago

@saleyn: If we spice this up further:

foo(X, {a, X}|{X, b}, {c, Y}|{Y, d}) ->
    {X, Y}.

Then we have a situation where we have to know in which order argument 2 and 3 are expanded.

RaimoNiskanen commented 3 years ago

Anyway. The I think EEP does not get to the bottom of how far @HansN's suggestion can take us, compared to the current suggestion.

saleyn commented 3 years ago

But in a function head it is explicitly stated that the evaluation order is undefined, and the alternative patterns would have to have a defined order, or disallow overlap (which does not happen for other clause sequences). Then we have an order between alternative patterns but not between arguments with alternative patterns, which is confusing.

This does sound like a paradox, but if this feature is documented as the syntactic sugar for a combinatoric expansion of the arguments with left-to-right order as written, such that your example above is semantically equivalent to this evaluation with a well-defined order, would it still have to comply with no order in function's argument evaluation or still be confusing?

1> [{Arg1,Arg2} || Arg1 <- [{a,'X'},{'X',b}], Arg2 <- [{c,'Y'},{'Y',d}]].
[{{a,'X'},{c,'Y'}},
 {{a,'X'},{'Y',d}},
 {{'X',b},{c,'Y'}},
 {{'X',b},{'Y',d}}]

Generally, I think, this feature of alternative patterns is probably going to be found very useful for simple cases, and for more involved nested ones, the code will be more readable when broken out in separate clauses.

lhoguin commented 3 years ago

Oh yes, you are right. I got confused that the match in itself returned a or b, and there you have a value that is ignored. So a rewrite to

case foo() of
    a -> a;
    b -> b
end

would be a statement that ignores the return value, meaning that this simple rewrite does not work. The compiler would have to be smarter.

Yes it needs to see if it'll be used or if it's just an assert, and in the latter case it can return ok instead of the value it matches. (Or whatever mechanism the compiler has when it needs to ignore return values.) I suppose this can be added to the EEP but it's not a huge deal. :-)

RaimoNiskanen commented 3 years ago

I may be repeating myself...

foo(X, {a, X} | {X, b}) ->
    X.

Evaluating foo(a, {a, b}) seems to either produce a, if arguments are evaluated left to right, or raise a function clause if arguments are evaluated right to left.

Defining that this should be expanded to:

foo(X, {a, X}) ->
    X;
foo(X, {X, b}) ->
    X.

clarifies that. And now we have had to define an order between alternative patterns, and we also would have to have a defined order between alternative patterns of different arguments.

Yes I find it confusing to define this expansion order while we have an undefined argument evaluation order. We have an undefined argument evaluation order because the evaluation order is not supposed to matter. But suddenly, with alternative patterns, it does! Confusing.

While e.g (the separator is just an example)

foo(X, {a,X}) |
   (X, {b,X}) ->
       X.

avoids this problem entirely. Note that when the body has a more relevant size, the extra characters compared to the current EEP proposal brings more clarity than burden, in my opinion.

We already have that argument pattern matching is performed on the whole argument list and that the argument order should not matter. Therefore I think it is a better fit if the pattern alternatives are between different argument lists, not pattern alternatives within the arguments. As in @HansN's suggestion.

essen commented 3 years ago

foo(X, {a, X} | {X, b}) -> X. is a pitfall, but how likely is it to be encountered in practice? I would much prefer to allow alternatives in arguments, even nested, and have the compiler reject ambiguous alternative matches with an error, than to have alternatives at the level of the list of arguments.

If the evaluation order of alternatives would become significant then the compiler should simply refuse to compile.

(How easy to detect is another question I know nothing about, though.)

RaimoNiskanen commented 3 years ago

And I think that if the compiler would refuse to compile overlapping patterns, that would be strange because now with pattern matching we have a defined order between patterns (and possibly a warning if a pattern is covered by all previous), so if one were not allowed to write gradually looser patterns with this construct would be bad.

saleyn commented 3 years ago

Yes I find it confusing to define this expansion order while we have an undefined argument evaluation order. We have an undefined argument evaluation order because the evaluation order is not supposed to matter. But suddenly, with alternative patterns, it does! Confusing.

Though, if the order of evaluation of alternative patterns in arguments is documented as left-to-right combinatoric expansion, the implementer has a choice of either using them when the undefined function argument evaluation order doesn't matter (the majority of times?), or spelling out the independent function clauses, in cases when the preservation of the undefined order matters.

RaimoNiskanen commented 3 years ago

@saleyn wrote:

Though, if the order of evaluation of alternative patterns in arguments is documented as left-to-right combinatoric expansion, the implementer has a choice of either using them when the undefined function argument evaluation order doesn't matter (the majority of times?), or spelling out the independent function clauses, in cases when the preservation of the undefined order matters.

I think the implementer should not be allowed that choice, because the reader can not know if the implementer made the "right" choice. Syntax should be designed to ease the burden for the code reader.