Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Terminal circumfix token should have (optionally) higher precedence #284

Open alabamenhu opened 3 years ago

alabamenhu commented 3 years ago

Problem

While it is possible to adjust the precedence of a circumfix operator by using the is tighter(), that only applies for the precedence of the start token. Once inside the circumfix, the terminal token is only considered a possibility after infixes and postfixes have been considered.

Details

This is problematic, as it means that the terminal token can not begin with any of the same characters as an infix or postfix, and many of these that may be desired (and logically make sense) cannot be used. For instance, the following sequences are, in effect, verboten as a terminal circumfix onset: , : Z X ff | ^ / + - * ~ o x ? ! < [ ( { as well as a few other two and three letter sequences. If creating a circumfix operator where there is also potentially a custom infix/postfix operator, this could be even more limiting.

As a practical example, see comments on reddit. Creating a circumfix:«<html> </html>» (which a user may logically wish to do) fails, because the current parse is:

<html> $foo </html>
│      │    │││   ╰compile error 
│      │    ││╰literal text 'html'
│      │    │╰regex starter
│      │    ╰less than
│      ╰variable $foo
╰begin &circumfix:«<html> </html>»

This is because &infix:«<» has tighter precedence. Using is tighter:(&infix:«<») does not change the parse.

alabamenhu commented 3 years ago

Effectively, the problem is the fact that there are two tokens, but only one way to set precedence. Two solutions I can see would be either

  1. Add a second trait or trait set, e.g. is eager, or to allow similar degrees of granularity, more-eager, as-eager, less-eager.
  2. Allow a second argument in the case of (post)circumfix precedence, using a * whatever for default. So in the case of the html example, we'd say circumfix:«<html> </html>» is tighter(*, &infix«<»)

I'd lean to the latter, although admittedly someone who wanted to set the initial and terminal tokens with different precedent types might have an odd looking definition (is tighter(foo,*) is looser(*,bar)), but such degrees of specificity would likely be rare.

pmichaud commented 3 years ago

There may be another issue here, which is that as described above, I think that there might be a problem with the tokenizer. It seems to me that longest token matching should treat </html> as a single token rather than backtracking to looking at it as separate < and / tokens. Either that or the problem isn't being sufficiently described in this ticket yet.

Pm

alabamenhu commented 3 years ago

There may be another issue here, which is that as described above, I think that there might be a problem with the tokenizer. It seems to me that longest token matching should treat </html> as a single token rather than backtracking to looking at it as separate < and / tokens. Either that or the problem isn't being sufficiently described in this ticket yet.

That is a definitely possibility I thought of. An extreme test case might be an idea I had toyed with the idea back in the day:

sub circumfix:<¿ ?> (\x) {  so x }
sub circumfix:<¡ !> (\x) { not x }

These would actually conflict with prefix tokens ! and ?. If the issue revolved around LTM, do you think that that would be possible without needing to separately set the precedence of the terminal ! / ? ?