Closed codesections closed 8 months ago
I don't see a problem with <|w>
.
It works as desired:
% raku
Welcome to Rakudo™ v2023.05.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2023.05.
To exit type 'exit' or '^D'
[0] > "apa pz" ~~ / <|wb> p. /
「pa」
[1] > "apa pz" ~~ / <|w> p. /
「pz」
The problem is <|wb>
which is mis-interpreted. Above, <|wb>
returns 「pa」
which is the wrong result. Maybe <|wb>
gets mis-interpreted as a quoted list? Shouldn't quoted lists forbid unescaped |
within?
https://docs.raku.org/language/regexes#Quoted_lists_are_LTM_matches
So DON'T merge this pull request because you will:
<|w>
Regex code, and<|wb>
without understanding its source. @codesections @coke @lizmat @JJ @fecundf @doomvox
@jubilatious1 This PR is for the documentation, not for any actual functionality!
Thanks for clarifying, @lizmat !
What's wrong with a) keeping the <|w>
syntax in the documentation, and writing spec-tests? Because <|w>
already works as advertised on the docs
website.
Assuming, how do we get an error thrown when <|wb>
is incorrectly attempted (user meant <|w>
instead)?
@jubilatious1
What's wrong with a) keeping the
<|w>
syntax in the documentation, and writing spec-tests?
As I explained here, (though omitting "if I had my druthers" at the end because I ran out of comment space), I feel the same way you do.
but not if a core dev has to do any of the work beyond reviewing/merging.
My view is that if users (like me or you) want RAST to support a currently unroasted feature then it's incumbent on us to organize and do all the work, including writing relevant issues and PRs, including spec tests and altering Rakudo, and (not too forcefully) lobbying for merging the work if any other users (including core devs) initially object to it being added to roast and RAST.
how do we get an error thrown when
<|wb>
is incorrectly attempted (user meant<|w>
instead)?
Rakudo would need to be altered to reject <|foo>
unless foo
is a single character, and, for now at least, only the single characters that have already been implemented. (That may well mean just <|w>
. I haven't looked into it further than my SO answer.)
I'm not willing to take the lead on this, but if you are, I will do everything in my power to make it a successful effort, where "successful" means we go through the process to the end, regardless of whether the end is adding <|w>
to roast and RAST or <|w>
ultimately being declined as a feature.
After looking at the S06 section @raiph linked, I see the merit of the <|...>
syntax. For example (if implemented), writing `<|d> would be a handy way to match either end of a run of digits.
But even if <|
were implemented, it should be documented as a separate type of zero-width assertion -- not mentioned in passing in a section on word boundaries.
helps prevent new code from being written that relies on [unroasted] behavior.
Sorry if I'm being presumptious, but I'm not convinced you've understood the situation. Here's what we're talking about:
say 'abc' ~~ / a <|quick> b <|brown> c /; # 「abc」
That is to say, if some code of a particular form is written (specifically of the form <|w>
) and the w
is not actually w
but some other letter, or even multiple letters, then the compiler accepts it -- and then does nothing whatsoever with it.
So we'd be preventing someone doing the combination of A) writing a particular form of nonsense code that quite possibly no one has ever written until it was typo'd a few days ago; and that person then B) relying on their nonsense code doing nothing at all; and then that person C) not mentioning to anyone this odd (non) behavior they're relying on!
I think the chances of that recurring are minuscule, and if it did, the consequences would be minuscule. My guess is you hadn't realized it's this minuscule.
I was reading regex guide now, really interesting seeing raku seemingly being the only ones trying to innovate on text matching further than what perl did decades ago. With this merge it made the docs inconsistent, it says To match any word boundary, use <?wb>
, but in the code example immediately following it is using <|w>
rather than <?wb>
, with no other mention of <|w>
anywhere.
Hi again @bo-tato
With this merge it made the docs inconsistent, it says To match any word boundary, use
<?wb>
, but in the code example immediately following it is using<|w>
I just submitted a commit (which was meant to be a PR) that has changed the (git repo of) the doc's regex page and a PR. Once they've gone thru they will have eliminated all mentions of |w
in the raku/doc repo.
no other mention of
<|w>
.
I found some more. But as per my prior comment, I've now queued up changes intended to remove all mentions of |w
throughout the raku/doc repo.
It might help to try clarify the underlying confusion visible in this issue. (Or it might just make it more confusing for you. Feel free to skip this bit until my next quote of what you wrote! 🤣)
<|w>
was "spec"d sometime in the last couple decades. By "spec"d I mean it was speculated to become a feature of Raku (née P6). By "speculated" I mean it was discussed in the design docs for Raku as if it was to be implemented in an "official" compiler (read "Rakudo"). By "official" I mean a compiler that implements "roast". By "roast" I mean... hold that thought.
<|w>
was never "roast"d. Or, if it was, it got removed sometime in the last couple decades. By "roast"d I mean it was made part of the "official" Raku. By "roast" I mean the "repository of all specification tests". This is stored in its own repo (hosted at https://github.com/Raku/roast). It is the official digital specification of Raku.
So we have the situation in which Rakudo, which implements the "spec", where "spec" means "roast", also implements features that are not in roast but were instead just in the "spec", where "spec" means design docs whose official status is something along the lines of "speculation about Raku's specification".
(For the sake of completeness, Rakudo also implements some features that aren't in either roast or the design docs. For example, dd
. And some of those features are in the doc, and can reasonably be relied on by users of Rakudo. I think it was agreed that they could be in the "official" Raku doc, but should be marked out as being Rakudo specific, not part of official Raku. In addition Rakudo also implements some features that are marked in the compiler source code as internal features. They can be relied on by core devs but should not be mentioned in the official Raku doc. There are other variations too but this comment is getting way too long.)
Bottom line: <|w>
happens to work in recent Rakudo's but isn't official Raku and doesn't belong in the Raku doc.
(All of this is despite my view that it would be cromulent if it one day returned, along with cousins like <|h>
et al. Unless and until someone steps up to make that happen, I support cleaning up loose ends.)
rather than
<?wb>
<?wb>
is valid Raku in the sense it's in roast. So it (at least in general, in principle) must be, and is, supported by every working release of Rakudo, as tested by the tests (which normally block a release if any of them fail -- all nearly 200K or so of them), and "should" be in the doc.
(Heh. Turns out that while <|w>
isn't in roast as a feature to be tested, it is used in one of the roast tests!
No, I'm not going to do a PR to change that. At least not yet. I've done the commit and PR in the doc repo; let's see how that works out...
raku seemingly being the only ones trying to innovate on text matching further than what perl did decades ago.
Well Machine Learning has a thing or three to say about that (and Anton Atonov may have something to say about it in the context of Raku), but yeah, with highlights/lowlights including TGCs, Rules, RCRE, Slangs, LOP, Inlines. But I'll leave discussion of these ways in which Raku(do) are (or aren't) advancing the state of the art of text processing for other times/places.
The docs currently refer to
<|w>
as a synonym for<?wb>
, but this doesn't seem to be valid syntax – it's not spec'd in Roast, it isn't documented elsewhere, and it doesn't work in Raku AST. So this commit removes it from here.See https://stackoverflow.com/q/78069120 for details.