Raku / doc

🦋 Raku documentation
https://docs.raku.org/
Artistic License 2.0
285 stars 293 forks source link

Undocumented `{expression}` evaluation behaviour in regex pattern #4112

Open zacque0 opened 1 year ago

zacque0 commented 1 year ago

Problem or new feature

{expression} will be evaluated in a regex pattern, yet it's not documented anywhere in doc/doc/Language/regexes.pod6. Use of {expression} can be found in other examples in regexes.pod6.

Sample code:

use v6.d;

my $count = 0;

my $str = "1, 2, 3, 4";

if $str ~~ / [\d+ {$count++} \D*]+ / {
    say "There are $count digits in total";
}

Output: "There are 4 digits in total".

Suggestions

Would like to have this behaviour explicitly documented in the doc. Thanks!

coke commented 1 year ago

https://docs.raku.org/language/regexes#Regex_interpolation

zacque0 commented 1 year ago

https://docs.raku.org/language/regexes#Regex_interpolation

To elaborate, syntax-wise, {expression} is sort of related to this section, where the descriptions of $var, $(code), <$variable>, and <{code}> can be found. Semantic wise, {expression} is not about regex interpolation, more like side-effect execution of code, therefore I don't think it fits into any existing section for that matter. So, I'll suggest to put it under the first section: https://rakudocs.github.io/language/regexes#Lexical_conventions

raiph commented 1 year ago

Semantic wise, {expression} is not about regex interpolation

I get where you're coming from but let me argue otherwise and find out if you can see another way of looking at things.

The dictionary definition of the English word "interpolation" is inserting something (typically something "foreign" or even "spurious") into something else.

I think it's appropriate to consider grounding a decision about how to document interpolation features in Raku in the meaning of the English word and its semantics to the degree it can reasonably cover the Raku meaning and its semantics.

Raku regexes (indeed most modern "regexes") are semantically just general purpose code. The fact Regex ~~ Method is True isn't just about the invocation of whole regexes but also their sequential execution model, including, in particular, that general purpose code can be interleaved with code that's exclusively about pattern matching and capturing.

Thus, while a {...} inserted into a regex doesn't participate in the regex's execution in the more obvious ways most other interpolations do, especially in the simplest regex languages/engines that do support some interpolation, I think {...} in a regex is nevertheless semantically an interpolation.

So, from my perspective:

Thoughts?

zacque0 commented 1 year ago

(a) I disagree. You're arguing since Raku regex is equivalent to general purpose code, it makes sense to accept code inclusion as a form of regex "interpolation". However, whether or not regex is code doesn't matter here because it's an implementation detail---it might be compiled to native (byte-)code or be interpreted by some regex engine but we shouldn't care. I see Raku regex as a DSL of its own so it's best to examine the semantic of {expression} from regex POV instead of from code POV.

(b) That said, your comment does provide a fresh perspective on regex "interpolation" to me. As you said, "interpolation" is adding something to the mix, and in this case, the mix is the regex. I'd always thought "interpolation" as adding something concrete to the mix. In mathematical terms, now I see it as something like A + 2 where A is the mix and 2 is the interpolated content. So, by viewing interpolation as an addition, we can naturally expand its meaning to cover nothing to the mix, i.e. A + 0, where 0 is, in this case, the {expression} content executed mainly for side effect(s). Then, who knows maybe in the future we might extend "interpolation" meaning even further to include adding something negative, e.g. A + (-2)! From this POV, we can include {expression}, :my ...; and friends into the A + 0 category.

(c) While the arguments in (a) and (b) are in favour of adding these constructs into the interpolation section, I question the decision to do so. General readers, like me before understand the argument (a) or (b), might not expect to find these constructs under the "Interpolation" section. So, either you have to educate your readers or to adapt docs to their general level of understanding.

zacque0 commented 1 year ago

Related to issue #2875

raiph commented 1 year ago

I decided to go read the relevant design docs. Here's Larry himself in Apocalypse #5, written in 2001:

just interpolate code using a closure: s/pat/{ code }/

My general view of terminology is to keep things simple. I frankly found the word "interpolation" scary when I first encountered it, but once I got the concept (a kind of weaving of something into something else) it was OK, and I wasn't surprised or confused when I found the above.

whether or not regex is code doesn't matter here because it's an implementation detail

What I was trying to talk about is not an implementation detail.

Let's forget my mention of code. For now let's assume discussion of it being code is a red herring.

(I thought it would help you see, but I think it's actually made things worse. It doesn't matter if it ends up as bytecode or whatever. We shouldn't care -- and I don't. That wasn't what I was trying to convey. Nor am I meaning to appeal to the fact Raku regexes are code that gets compiled. My point would still apply if they were just strings that get interpreted.)

I see Raku regex as a DSL

Agreed.

on its own

I hear you say you see it as that, but it isn't on its own.

Consider some simple examples:

What's going on with Raku's take on regexes is vastly more profound than those two simple examples, but they will hopefully give you pause.

(You may find your mind instantly offers rationalizations of why those don't contradict viewing the Raku regex DSL as a thing on its own, but trust me, it isn't a thing on its own, and that's a good thing, and we don't want to try too hard to pretend otherwise except in some of the beginner material such as X-language-to-Raku guides, simple tutorials, and the like. In reference doc we need to present Raku simply, but not so overly simply that we inculcate terminology and mental models that will confuse learners or otherwise unduly let them down as they get into Raku.)

so it's best to examine the semantic of {expression} from regex POV instead of from code POV.

Note how this either/or notion conflicts with Larry Wall's and vision, as expressed in Apocalypse #5:

if we emancipate regexes to serve as co-equal control structures, and if we can rid ourselves of the regexist attitudes that many of us secretly harbor, we'll have a much more productive society than we currently do. We need to empower regexes with a sense of control (structure). It needs to be just as easy for a regex to call Perl code as it is for Perl code to call a regex.

He's not talking about the syntactic simplicity of {...}. That's trivia. He's talking about solving the problem described in the title of the section containing that paragraph: Poor integration with rich languages.

More generally, perhaps it's best to back up to something he says near the start:

I need to warn you that this Apocalypse is going to be somewhat radical. We'll be proposing changes to certain "sacred" features of regex culture, and this is guaranteed to result in future shock for some of our more conservative citizens. Do not be alarmed. We will provide ways for you to continue programming in old-fashioned regular expressions if you desire. But I hope that once you've thought about it a little and worked through some examples, you'll like most of the changes we're proposing here.


General readers ... might not expect to find these constructs under the "Interpolation" section. So, either you have to educate your readers or to adapt docs to their general level of understanding.

Agreed.

Raku innovates in ways that simultaneously simplify, generalize, and power up many things but, precisely because they're innovations, we have to educate.

How best to do that?

I'd certainly expect X-to-Raku nutshells, guides, tutorials to err on the side of focusing on the initial level of understanding one might expect from familiarity with X. And I'd say that there should be simple tutorials for highly general aspects that also err on the side of familiarity.

But the general reference material? I think that's the place where outline educational structuring is appropriate, and that using "interpolation" in the way Larry did is of that ilk.

Maybe. At this point I'm just wanting to open your eyes to a different way of looking at things. I do not want you to lose the prior perspective you had. The ideal is you keep both perspectives alive simultaneously, letting your mind explore possibilities, and wait for something to pop out as an elegantly simple way to structure and express what needs to be documented.

2colours commented 1 year ago

I don't think Raku has enough materials to deliver both conventional stuff for beginner outsiders and an elaborate and consistent view of the language with its own terms. Neither do we have enough resources to do that.

Also, I'd strongly advise against this attitude that we somehow need to get people learn Raku the way a long-time expert user sees it, making them (re)learn a lot of terminology with basically one goal in mind: that they eventually say "wait, it all makes sense that way". It seems kind of elitist, which I don't think was ever the intention of the Raku design process.

What I particularly don't like for a precedent is this anglocentric "word wizardry", where basically native English knowledge trumps widely known CS terminology. I think this is both counterproductive and heavily opposed to the declared inclusivity mission of the whole Raku project, This contradiction is present in the language as well with the "negation lifting" or how is it called (negation meta-operator disobeying de Morgan identities with any and all in order to appeal to a hypothetical naive English native). That's a whole different set of issues that I will eventually return to - not in this issue, though.

Eventually, with all due respect to raiph for reading Apocalypses and Synopses for guidance, I think they appear in too... biblical ways. We are in the post-Larry era so even if something makes sense and is still relevant at all, it can happen that we just cannot synthesize it in a meaningful way - simply because we don't have the same vision as Larry Wall does/did. To start from the broad and vague declaration of intents that regex and code should interoperate naturally, and reach to the conclusion that plain, unmarked code inclusion in regex is to be expected, seems a far stretch.

Having said that, I'm struggling to come up with something fundamentally better - not without hard-to-estimate restructuring.

Eventually, adding code inclusion as a kind of interpolation is better than nothing and it's pretty simple.

If we want to move beyond that, I'd suggest developing a more general overview of the regex slang - how to invoke it (1), what values it knows (2), what operations it knows (3) and how it can integrate with the "main slang" (4). The "interpolation" segment could be divivded between (2) and (4) in this numbering, even redundantly.

This really depends a lot on one's vision about the documentation. From my perspective, it's mostly an illustrated reference where you can look up syntax and core language features in general. It serves this purpose best if it tries to acknowledge the different background of people and adapt to it - not by being overly didactic but by offering the least possible surprise. Of course it's impossible to optimize for everyone equally but at least if it's clear that someone who hasn't read Raku design materials from day 0 will get surprised, "let's rather not do it".

jubilatious1 commented 11 months ago

Steal example code from here to write documentation:

https://github.com/rakudo/rakudo/issues/3564