Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.93k stars 552 forks source link

[feature] Thoughts on a non-smart "match" operator #17290

Open leonerd opened 4 years ago

leonerd commented 4 years ago

TL;DR - a suggestion of new syntax to provide a "switch"-like case dispatch semantic without running into the "string or number" problems inherent in smartmatch.

It is a truth universally acknowledged, that any language in possession of typed values must be in want of a way to dispatch on comparisons between them.

Traditionally perl authors have used techniques like if/elsif chains. Perl 5.10 added the ill-fated smartmatch and given/when syntax - the less said about them the better. It suffices to note that they were "too smart" and tried to guess whether the author meant string or number comparison, or how to distribute "any"-style logic among arrays or hashes or various other "guess-what-I-mean" behaviours.

Meanwhile, CPAN provides some nice neat solutions - one to draw particular attention to is Switch::Plain. This doesn't attempt to do anything fancy. The programmer has to ask for number (nswitch) vs stringy (sswitch) semantics. It's trivially simple but it works really well in practice.

I think this suggests a decent motivation for having something like it in core. We already have quite a lot of infix operators which could be used, so I would like to wave as a first-draft some thoughts on syntax:

match($x) on eq {
    case "abc" { say "This is obviously a string eq match" }
    case "def" { ... }
    default    { ... }
}

In a first iteration this can easily be done (as a CPAN module if necessary) as purely syntax-sugar over the optree created by unfolding it into the obvious if/elsif/... plus using a pad temporary to evaluate $x just once.

By forcing the programmer to explicitly state their comparison operator, we've avoided perl having to guess if they wanted strings or numbers, because if they wanted numbers they can

match($i) on == {
    case 1 { ... }
    case 2 .. 5 {
        say "Ah now here's a question - do we allow case lists to have any-like semantics?"
    }
}

match($s) on =~ {
    case m/[a-z]/ { ... }
    case m/[0-9]/ { ... }
    case m/@.@/ { ... }
}

This now motivates why I wanted isa to be an infix operator (#17200):

match($obj) on isa {
    case DerivedClass { ... }
    case Another::Class { ... }
    case Base::Class { ... }
}
leonerd commented 4 years ago

A possible objection to the syntax presented above is that of the dangling infix operator after the match EXPR on ... introduction keyword. It feels a bit ugly just sitting there, not infixed between two things as it usually lives. Maybe we can do better than this by somehow taking something like the Raku "whatever star", it if would fit syntactically, and notate these behaviours instead as:

match ($x eq *) {
    case "abc" { ... }
    case "def" { ... }
}

match ($i == *) { ... }

match ($obj isa *) { ... }

Syntactically it looks a lot neater, and one could be tempted to accept any sort of expression in there to allow arbitrary match operators:

match (*->can_handle($req)) {
    case SimpleRequestHandler { ... }
    case WeirdRequestHandler { ... }
    case AnotherRequestHandler { ... }
}

though this becomes hard to reason about whenever expressions in the match condition have side-effects; what would this do?

match ($x++ == *) {
    ... # many cases
}
tonycoz commented 4 years ago

The second version appeals to me more, though we already have a typical spelling for *, it's $_, though using it here would be reversed in meaning from the current switch() implementation.

As to side-effects, the call to can_handle() could have side effects too.

The main issue I can see with side-effects is optimization, let's say we have:

match ($x eq *) {
case Foo { ... }
case Bar { ... }
}

and we want to perl itself to optimize that internally - is $x (which might have get magic or overloading) evaulated for each case in the optimized form?

Grinnz commented 4 years ago

I prefer the first version only because we don't have anything like the second currently, and it implies a level of "do whatever you want here" that could lead to a lack of optimization for the common cases (in syntax and performance) - the purpose of this is to meet the common needs, so that people stop trying to use smartmatch to do it.

tobyink commented 4 years ago

I offer https://metacpan.org/pod/Switcheroo for consideration.

switch ($x) {
  case "abc": say "it's abc";
  case "def": say "it's def";
  default:    say "it's something else";
}

Blocks are allowed:

switch ($x) {
  case "abc": { ++$_; say "it's abd"; } # switch aliases $_ to $x
  case "def": { say "it's def"; }
  default:    { say "it's something else"; }
}

You can change the type of comparison operator used, and it uses the $a and $b variables to express how to do the comparison. Yeah, maybe on would be better than mode.

switch ($x) mode ($a<$b) {
  case 10:  say "it's less than 10";
  case 100: say "it's less than 100";
  default:  say "it's big!";
}

You can slip in an arbitrary boolean block as a case:

switch ($x) mode ($a eq $b) {
  case "abc":            say "it's abc";
  case { $_ =~ /def/i }: say "it's def (kinda)";
  default:               say "it's something else";
}

Switcheroo allows this to act like an expression:

say switch ($x) do {
  case "abc": "it's abc";
  case "def": "it's def";
  default:    "it's something else";
};

Notice where do is. But in retrospect, I think this is unnecessary and confusing. If people want to use it in an expression, they can use a normal do block.

say do { switch ($x) {
  case "abc": "it's abc";
  case "def": "it's def";
  default:    "it's something else";
}};

Switcheroo uses match::simple as its default comparison operator, but obviously I wouldn't recommend that for a core feature. Just use ($a eq $b) as the default comparator.

xsawyerx commented 4 years ago

The first version seems Perlish to me until we see == and isa, which read uncomfortably for me. Then again, the second version, to me, is far more uncomfortable.

I think we need to think about this further.

warpspin commented 4 years ago

I'd rather have a return of possible smartmatch simplifications and given/when, shielded behind a use feature/use version with a really simple, matching table:

$anything ~~ undef      # !defined($anything)
$anything ~~ $smartmatch_overloading_object # calls overload
$anything ~~ $regexp # $anything =~ $regexp
$anything ~~ $scalar # $anything eq $scalar
$anything ~~ $coderef # if( $coderef->($anything))
$anything ~~ @array # $anything in @array, strictly eq comparison

and no fancy stuff with arrays/hashes/whatever on the LHS at all. Yes. I even left out a numbers special case, as this will avoid ambiguities with the contents of @array in the array case. For the simple cases like ints, string comparison will still work and anything fancy like ranges can be implemented with something alike Smart::Match which could be included with Perl.

And then a return of given/switch/match/whatever you want to call it using that operator strictly. In a second release possibly also as expression, returning its last matched expression as value:

my $result = match $something {
    when "abc": 'alphabet';
    when "xyz": 'gibberish';
    when @keyword: 'in keyword list';
};

Honestly, with the simplifications in perl 5.10.1 the smart matching is almost there. Even something like 'use strict "smartmatch";' which would allow disabling any dependencies on the left hand side of the match would help already, though I'd prefer something which would simply be active with use version/use feature

The reason given/when is a flop is mainly the remaining LHS dependencies plus, simply said, missing trust people have after the numerous incompatible changes. Yet another switch/match will not restore the trust, so we can as well go with a solution which is more powerful than the suggestions here, will mostly match the use cases which were already possible with the existing given/when feature minus the confusion and from there on the trust will return if it's stable for a while.

tobyink commented 4 years ago

@warpspin that's almost what match::simple implements

warpspin commented 4 years ago

@tobyink Yes, just looked at it. Really basically fits my thoughs, but those Sub::Infix operators, uh :D But yes, Switcheroo/match::simple in the core would basically do what I want. I mainly dislike the mode thing about that plus the necessity for additional do in expressions

kurahaupo commented 4 years ago

(I'm a long-time Perl user, and made some early commits to Raku, but I haven't been involved with this community before, so please excuse me if the terminology I've borrowed from compiler design seems out of place here.)

I look at this from two points of view.

On the one hand, the original impetus for "switch" in C was to allow a table-driven despatch to the cases, without actually doing repeated comparisons. It would be nice to at least not rule out that level of efficiency when choosing a syntax for this feature. To achieve that would mean knowing at compile time that the lookup value is integer (or can be hashed to a number), the cases are integer constants (or can be hashed at compile time), and what the upper & lower bounds are.

On the other hand, I like the expressiveness and wouldn't want those constraints enforced on the usage.

I like the brevity of simply stating on and an infix operator, but allowing any kind of match expression would also be nice; so I'm thinking something more like this:

switch ($x) {
    case == 0: { do zero stuff }
    case == 1: { do one stuff }
    case eq "five": { do string stuff }
    case isa BigNum: { do biggie stuff }
    match { 2 <= $_ <= 10 } { do range stuff }
    default: { do default stuff }
}

(This would be as an alternative to the syntax suggested at the head of this issue, or perhaps allowing a hybrid.)

Notionally case stands in for the value of $x in each comparison, but it's also a keyword that introduces a, well, case, and takes an infix comparison operator and an expression (which for the sake of sanity, should not have a top-level operator whose normal precedence is lower than that of the comparison operator). In the first release it could be restricted to a single literal value, and relaxed later.

Static analysis could therefore group together all consecutive cases with the same comparison operator, where they specific compile-time constants, and perform a direct despatch if it's economical to do so.

The default keyword is equivalent to match {1}; perhaps there could be just one keyword for both cases, but omit the comparator block for a default.

The infix operator could even be one that takes a LIST or RANGE as its RHS, suggesting a putative:

case grep_num 2,3,5,7,11,13,17,19: { do small primes stuff }
case between 20,39: { do tween stuff }

I would consider allowing explicit lists and ranges, but not arrays:

case == 1 .. 5: { do stuff for the first five positive integers }
case eq 'small', 'medium', 'large': {}

I would prefer not to allow any kind of fall-through, not even:

case == 1:
case == 2: { do stuff for one or two }

And lastly, I think it should be at least a warning if the same operator and constant value appears in more than one case.

tobyink commented 4 years ago

@kurahaupo the following already works:

for ($x) {
    if($_ == 0) { do zero stuff }
    elsif($_ == 1) { do one stuff }
    elsif($_ eq "five") { do string stuff }
    elsif($_ isa BigNum) { do biggie stuff }
    elsif(2 <= $_ <= 10) { do range stuff }
    else { do default stuff }
}

I don't think your example is much clearer, and there's only a three character difference in length.

kurahaupo commented 4 years ago

The point I'm aiming for, though perhaps not making very well, is the idea that a switch should be, if possible, a single despatch, so that the order of the cases is unimportant.

Whilst an if/elsif/else chain achieves the same thing, it actively emphasises the opposite idea, that they are ordered and that the first match "wins".

With that in mind, I should perhaps withdraw the proposal to allow varying the comparison operator between cases, and leave it fixed from the beginning.

The other aspects such as requiring compile-time constants and allowing lists & ranges remain valid proposals.

haarg commented 4 years ago

The only way you'd be able to really enforce having only one case match is if they were all using the same comparison operator, and only eq or ==, so having to repeat it on each case would be redundant. And given overloaded objects, you still wouldn't be able to enforce only one match.

kurahaupo commented 4 years ago

Agreed that repeating identical operators on every case would be pointless; I thought it'd be optional after each case when it matches the operator declared with on (if any), or alternatively, where it's the same as the preceding case. However now I'm questioning whether different comparison operators within a switch should or shouldn't be allowed; I'm undecided on that point.

Assuming a common comparison operator, ordered comparisons and isa could work using "most specific match wins", which for isa would mean using the same rule as for method despatch, and for other operators would mean notionally reordering the comparisons.

However that implies that the value would be numified, stringified or classified, and then despatched, rather than invoking overloaded operators. An alternative would be to inspect the value and only do direct despatch if it doesn't overload the declared operator.

If the operator is overloaded, then the notional reordering becomes real, though perhaps a binary search could be used where the operator's standard semantics imply a fully ordered domain (lt le ge gt < <= >= >). As an implementation detail, that search could be rebalanced occasionally by keeping usage counters. The order that the cases are written could be a hint for the initial balancing.

I suggest that using more than one operator where any of them is an ordering operator should elicit a warning or error, because it becomes difficult for humans to reason about the implied reordering.

haarg commented 4 years ago

If the operators can be different between cases, then more than one can match, and that makes the order of the cases important.

There is no such thing as a "most specific" match with the isa operator.

Attempting to convert the value into a string or number before calling the op is not how any other part of perl works. I really don't think it's viable to make ordering unimportant.

kurahaupo commented 4 years ago

Most of my last comment was prefaced with "assuming a common comparison operator,…" so it's not clear how I'm supposed to interpret a response that's prefaced with the opposite assumption.

I'm finding this frustrating because it seems like pieces of what I wrote are being taken out of context.

These points seem to have been ignored:

I'm sure there are flaws in my idea and maybe in the logic that I used to derive it, but so far the only cogent rebuttal is simply that numifying or stringifying and then comparing "isn't how the rest of Perl works". That's a good point, but is it sufficient? It's really no different from the sswitch/nswitch idea that was floated earlier in this conversation.

kurahaupo commented 4 years ago

And just for the record, method injection would be a stupidly clumsy way of implementing this; OBVIOUSLY I don't mean that to be taken as "how it actually works".

(Why do I feel like I have to anticipate every perverse fragmented reading of what I've written, and write a rebuttal in advance?)

tonycoz commented 4 years ago

I suggest that using more than one operator where any of them is an ordering operator should elicit a warning or error, because it becomes difficult for humans to reason about the implied reordering.

I'd tend to think in terms of optimizing what we can, and falling back to a case by case check.

So if some initial sequence of cases were a numeric equality checks against compile-time constants and the value wasn't overloaded we could do a table lookup.

I do think retaining the fallback on an overloaded value is useful, consider code written under use bigint.

In any case I'd expect an initial implementation to compile down to an effectively if-else chain (maybe with different ops) and later changes could add optimization into table lookups etc (a sequence of regexp matches might combine all the regexps into one.)