Unclear sharing of `$/` causing race conditions

lizmat commented 12 months ago

The following code produces consistent results:

grammar G { token TOP { \w+ } }

sub summarize() {
    ("aaa" .. "zzz").map( {
        my $result = G.parse($_);
        $result.Str.comb.Set.keys.ords.sum  # something to keep CPU busy
    } ).sum
}

say summarize;  # 6615297

However, if we put a .race or a .hyper in the mix (as in ("aaa" .. "zzz").race.map( {), the results become random with only occasionally producing the correct result.

This can be fixed by adding a my $/ inside the code block of the .map. Or by changing the block to a proper sub (("aaa" .. "zzz").map( sub ($_) {).

So clearly this problem is caused by the block in the map not having its own lexical $/, so that G.parse is setting the $/ that is implicitly defined outside the block, inside sub summarize.

In 6.e currently Grammar.parse (and friends) will not set $/, and then the above code will work as it should. However, that is a breaking change that we may not want to keep. On the other hand we may want to keep this change to reduce "action at a distance" more generally.

2colours commented 12 months ago

Massive +1 for eliminating this "action at a distance" in the future (whether it's as soon as 6.e or later on).

For setting (and only that - not reading!) $/, I realize now that I didn't express myself clearly on IRC. I will try to use "it" less often...

I don't feel strongly about setting $/ or not. People more like AlexDaniel would be very much in favor of eliminating $/, I think. I don't mind eliminating $/ but I don't mind keeping it either, as long as only the end user reads the value of it.

I could even imagine a pragma for these implicit "magic variables" that are set in operations, something like use result-vars, if we think it should be opt-in.

lizmat commented 12 months ago

~~In an unexpected turn of events~~:

grammar G { token TOP { (\w+) } }  # add () to set $0

sub summarize() {
    ("aaa" .. "zzz").race.map( {
        G.parse($_);
        $0.Str.comb.Set.keys.ords.sum  # something to keep CPU busy
    } ).sum
}

say summarize;  # 6615297

~~It appears that explicitely mentioning $/, or implicitely with $0 in the block scope, will codegen a lexical $/ in that scope.~~

~~So it looks more like this is a bug / deficiency in not noticing that $/ is going to be accessed, that is preventing $/ from getting its own lexical copy.~~

Sadly my testing was bogus, probably because I used a sub inside the map instead of a just a block.

2colours commented 12 months ago

@lizmat does this effectively mean that all matchings should create their own $/? In which case the name of that variable could really be anything?

lizmat commented 11 months ago

Further golf of the code:

say ("aaa" .. "zzz").race.map({
    / \w+ /;
    $/.Str.comb.Set.keys.ords.sum
} ).sum

So note that this problem is actually more general, and not limited to using grammars.

2colours commented 11 months ago

Is the problem because of the user-space access of $/ or would it also happen with $match = .match(/ \w+ /); $match.Str.... as well? I would say that makes a big difference because the user-space access is "clear sharing".

jubilatious1 commented 11 months ago

Don't see any reason a user should set $/ ever. How to prohibit?

lizmat commented 11 months ago

A similar situation may actually make sense:

$_ = "foo";
if 42 {
    m/ \w+ /;
}
else {
    m/ \d+ /;
}
say $/;  # ｢foo｣

However, there are a few strange things happening here. If we remove the m from m/ \w+ /, the outer $/ is Nil in the legacy grammar, but remains ｢foo｣ in the new Raku grammar.

patrickbkr commented 11 months ago

Sharpening my understanding. Is the following a correct description of the situation? Subs and methods always bring their own lexical $/, blocks do not. That $/ is set by regex calls and Grammar.parse and friends. https://github.com/rakudo/rakudo/commit/8542ad2116 changed it so Grammar.parse and friends don't set $/ while Regex calls still do. Currently noone suggests to get rid of regexes setting $/. Also there is ~~an unexplained behavior~~ bug in the legacy grammar where $/ seems to not be set in a corner case.

patrickbkr commented 11 months ago

A similar situation may actually make sense:
$_ = "foo";
if 42 {
    m/ \w+ /;
}
else {
    m/ \d+ /;
}
say $/;  # ｢foo｣
However, there are a few strange things happening here. If we remove the m from m/ \w+ /, the outer $/ is Nil in the legacy grammar, but remains ｢foo｣ in the new Raku grammar.

$_ = "foo";
{
    / \w+ /;
    say $/;  # ｢foo｣
}
say $/;  # ｢foo｣

$_ = "foo";
{
    / \w+ /;
}
say $/;  # Nil

I'd firmly classify this as a bug. I would have expected for $/ to be ｢foo｣ in all of the above cases.

lizmat commented 11 months ago

@patrickbkr Yes, that is a bug. Fortunately, this is already fixed in RakuAST, so I won't be spending time on trying to fix this in the legacy grammar.

lizmat commented 11 months ago

After discussing this with @jnthn, a lot has become clearer to me.

Basically, the functionality of $/ is based on nqp::getlexcaller which states: "Looks up the lexical with the specified name, starting at the calling frame. It checks all outer frames of the caller chain."

I didn't really grok this, and will probably adapt the text to make it clearer in what it does.

The pseudo code for matching is basically:

method Str::match(...) {
    nqp::getlexcaller('$/') = match-result;
}

What nqp::getlexcaller does, is walk the caller stack to see if a lexical $/ is known in that scope. If there isn't, it will walk the OUTER scope of that scope to find a lexically defined $/. Now in practice, this means it will always just find the $/ in one of the OUTER scopes of the caller scope, because there is bound to be a subroutine / method / mainline with a $/ in the OUTER scope of the caller's scope.

This, combined with the fact that blocks do not declare a lexical $/, creates the situation that allows the above examples to work.

Now, this is a very elegant design that works like a charm in a single-threaded world. But in a multi-threaded world, this is a nasty WAT. For example:

start { "foo" ~~ / \w+ / }
sleep 1;  # give thread startup a little time
say $/;  # Nil

Huh? Why doesn't that say ｢foo｣?

Promise.start: { "foo" ~~ / \w+ / }
sleep 1;  # give thread startup a little time
say $/;  # ｢foo｣

Isn't that supposed to be the same?

Yes, it is, but the grammar handling of start { } stealthily adds a my $/ to the block.

start { say MY::.keys }  # ($/ $_ $!)
sleep 1;  # give thread startup a little time

versus:

Promise.start: { say MY::.keys }  # ($_)
sleep 1;  # give thread startup a little time

If there is a keyword in the Raku grammar that allows for special compile time tweaking of the opcode, it can be done. However, in the case of Promise.start, it's just a method called on a class with a block as an argument. And applying heuristics on that, would be very fragile, and possibly even harder to explain and document.

Conclusion: I think we need to document this behaviour of $/ more thoroughly, and add it as a possible trap.

Going back to the golfed code, adding a my $/:

say ("aaa" .. "zzz").race.map({
    my $/ = / \w+ /;  # define lexical $/ to prevent race condition
    $/.Str.comb.Set.keys.ords.sum
} ).sum

is a simple workaround that I believe can be easily explained and documented.

2colours commented 11 months ago

I think this issue is getting fragmented.

The original issue that I responded to was about $/ acting as a hidden, internal parameter in the parsing process. Note that the code had no appearances of $/, yet it was affected by the scoping of that variable. That's an implementation topic.

The more recent comments were about how $/ works when somebody actually and deliberately uses it. That's more of an interface-level topic.

Let's decide which issue is the one discussed here. Another issue can be opened for the other phenomenon.

(Honestly, since the original code snippet doesn't have $/ as a part of any interface, I tend to think that the original topic should be treated as a mere bug, even if it has implications for existing code for some reason. Perhaps it shouldn't.)

lizmat commented 11 months ago

The title stated the subject. The G.parse was just a part of the problem, which is now fixed in 6.e by not letting Grammar.parse set $/ anymore.

jubilatious1 commented 11 months ago

Do we need a @/ Array for Grammars?

lizmat commented 11 months ago

@jubilatious1: no

2colours commented 11 months ago

which is now fixed in 6.e by not letting Grammar.parse set $/ anymore

The problem was less the setting of $/ and more the internal reading of it, though. Of course this can be radically solved by eliminating all internal references to $/ but that doesn't seem necessary, as long as the implementation only uses $/ as an output value. Or what am I missing?

vrurg commented 11 months ago

I would stick to the state of affairs where in 6.e $/ is not set. Because it brings in more problems than it solves.

As far as I understand, most concern with removal of $/ is $<match> syntax. I personally don't see a big deal in something like:

if /.../ -> $m { say $m<match> }

But as long as we give special meaning to the topic variable, we can give it to $/ as well, except that it would have been explicitly declared:

if /.../ -> $/ { say $<match> }

If things go for dropping it out then I wouldn't care much about the bugs in 6.c and 6.d. They could be documented for the sake of clarity.

If $/ is preserved then best if it becomes block-local default where a regex is used:

sub foo {
    # my $/
    if condition() {
        # no $/ 
        @a.map({ #`{ my $/ } /.../ });
    }
}

I.e. the block within map gets its own lexical $/.

This behavior would be easier to document and explain. It wouldn't fix cases like @a.race.grep(/.../), but simplify explanation of why it doesn't work as expected.

Though, as a side note, the above snippet with grep is another good example as to why $/ is bad: barely anybody would use it but each run of the regex wastes a bit of CPU time on setting the variable.

Speaking of grammars, to support them the compiler must now that G.parse is a call to grammar's parse and not something else. It is possible to check what G is and make decisions about installing or skipping $/, but does it worth the complications of compiler implementation?

To conclude, it looks a lot like trying to fix it doesn't worth the time and the energy spent.

FCO commented 11 months ago

As we do on action methods...

lizmat commented 11 months ago

FWIW, from a performance point of view, I would be very much in favour of not setting $/ automatically with / ... /, or even .match, or even .subst!

"foo".match(/.+/);
say $/;                        # ｢foo｣
say "FOO".subst(/.+/, "bar");  # bar
say $/;                        # ｢FOO｣

But it will break a lot of code out there :-(

vrurg commented 11 months ago

@lizmat make it a 6.e?

jubilatious1 commented 11 months ago

@lizmat said:



_But_ it will break a **lot** of code out there :-(

Changing $/ would break a significant percentage of my U & L Stack Exchange Answers:

https://unix.stackexchange.com/users/227738/jubilatious1

No way to throw an error on hyper/race code that touches $/ instead?

@TimToady @thoughtstream @pmichaud

lizmat commented 11 months ago

@jubilatious1: would that code also be broken if .match and .subst would no longer set $/ ? Because I personally find that to be a worse "action at a distance" than just / foo /, which clearly is magic, and as such could be expected to set $/.

Also, I wonder how many people know that .match and .subst set $/ at all. That .parse sets $/ was even news to the architect :-)

jubilatious1 commented 6 months ago

IMHO, you're proposing eliminating the "Baby Raku" that I 'grew up' learning. I can look back in my history and find REPL examples of the variety:

~ % raku
Welcome to Rakudo™ v2023.05.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2023.05.

To exit type 'exit' or '^D'
[0] > say $/ if "cow" ~~ "cow";
Nil
[0] > say $/ if "cow" eq "cow";
Nil
[0] > say $/ if "cow".match(/cow/);
｢cow｣
[0] > say $/ if "cow".match(/co./);
｢cow｣
[0] > say $/ if "cow".match(/co(.)/);
｢cow｣
 0 => ｢w｣
[0] > say $/ if "cow".match(/cot/);
()

Efficiency isn't everything: you need entree-points into the language. Otherwise, how will people learn the Raku regex-engine?

Finally, aren't you worried about unintentional breakage? Will something like the following still work without setting $/ (or $<>)? Aren't you counting the number of times $/ (or $<>) captures, below?

raku -ne 'BEGIN my $i; ++$i for m:g/ … /; END $i.say;'

(FYI, the bare-bones / … /matcher (without the leading m) implies to me that less work is being done by the language, not more. It's degenerate. Quote-unquote magic and magical actions should be reduced to a minimum).

@TimToady @thoughtstream @pmichaud

raiph commented 6 months ago

@jubilatious1

"magic" and "magical actions" should be reduced to a minimum

What do you mean by "magic"? What difference do you mean when you separate "magic" from "magical action"?

In particular, I know what I mean by those words, and would appreciate you explaining why you would disagree with what I think they mean:

"magic" and "magical action" are more or less the same thing.
Implicitly declaring a local $/ is magic.
Implicitly declaring a local $/ conditionally based on non-obvious rules is magic.

Following the principle you suggest, we should consider dropping the above magic for 6.e.

I for one think we should consider doing so. (I personally think magic can have its place, and imo sufficiently tried and trusted magic should not be reduced just because it's magic. That said, imo it's a hot prospect for the chopping block if instead it's sufficiently tried and distrusted.)

Among many possible motivations for going beyond mere consideration of such minimization is that we need to either double down on the existing magic to make the magic thread safe or we need to adequately address the downsides of the magic -- and we need to allow that there may be no adequate way to address the downsides, and 6.e may well be an opportunity to make a break we need to make.

Note that that is about code randomly being silently incorrect, or crashing, depending on indeterminate variations between runs. It's not about having good ergonomics, which is nice to have, nor is it about good performance or efficiency, which is also nice to have, but is instead just about straight up the worst problem of all -- code that silently unpredictably does the wrong thing due to magic.

jubilatious1 commented 6 months ago

@raiph , not sure if I have an answer for you yet. However, my experience tells me this is easy:

Sort a list of words by last letter:

~ % echo 'cat dog horse' | raku -ne '.say for .words.sort: *.match(/ .$/);'
horse
dog
cat

This is less easy, I'm clearly going for a two column linewise output of "word, last-letter" here. But I don't get it:

~ % echo 'cat dog horse' | raku -ne 'my @a = .words; say ($_ , $/) for @a.map: *.match(/ .$/);'
(｢t｣ ｢t｣)
(｢g｣ ｢g｣)
(｢e｣ ｢e｣)

So I re-configure, but (again, fiddly) I have to parens the match clause to get it to work:

~ % echo 'cat dog horse' | raku -ne 'my @a = .words; for @a { say "$_ ", ($/ given .match(/ .$/))};'
cat ｢t｣
dog ｢g｣
horse ｢e｣

So maybe the take-home is if you give users access to a variable such as $/ (or $<>) that gets "populated" when a match occurs, don't you expect them to make use of it?

[[ Extra Credit: now sort the last codeblock in last-letter order. ]]

@TimToady @thoughtstream @pmichaud

raiph commented 6 months ago

@jubilatious1

Two overall points. If you don't understand something, it's good to only use it to try understand it. And if something matches your expectations, that itself doesn't mean it should. (I've written both these points in a manner that may make it unclear what it is I'm talking about. And that is a key part of the point I'm making. If you're confused, consider just reading the rest of this comment and returning to this opening paragraph if/when you think you might understand it.)

I have to parens the match clause to get it to work:

No you don't:

my @a = .words; for @a { say "$_ ", .match: / .$/ }

You were just confused about it.

maybe the take-home is if you give users access to a variable such as $/ (or $<>) that gets "populated" when a match occurs, don't you expect them to make use of it?

Exactly.

Hence we need to do something about the thorny problems that has created.

First comes the technical problem. While $_ can be problematic because some users don't understand it, at least it's naturally used as a local and lexical variable by both built in features and users. That means that users have a decent chance of learning to use $_ correctly despite any initial confusions (such as the ones you illustrated in your comment).

In stark contrast $/ is naturally used as a global and dynamic variable by both built in features and users. For local and lexical scenarios such as one liners the balance between the utility and convenience of $/ is approximately the same as it is for $_. But in just about all other circumstances it's "magic" of exactly the kind that no sane person who understands what it's doing really wants in Raku (other than for reasons of maintaining backward compatibility for code that needs it).

And that leads to the consequent problem. What do we do about this problem?

A lot of us are hoping we can clean up aspects of Raku like this that create huge problems with very little benefit. The globality and dynamicism of $/ is one of the most long standing of such problems, especially the degree to which it is undisciplined.

But those who don't really understand the problem (and I take it that that includes you) may only see a benefit we want to remove while not seeing the huge problems that come with the benefit. So our only option is to try develop more understanding in the user base in the hope we can build consensus supporting consideration of elimination of the problematic feature.

[[ Extra Credit: now sort the last codeblock in last-letter order. ]]

my @a = .words; say |$_ for sort *[1], @a. map: { "$_ ", .match: / .$/ }

jubilatious1 commented 6 months ago

@raiph It's an undeniable fact that if you remove the parens I referred to, you don't get the desired two-column "word, last-letter" output:

admin@mbp ~ % echo 'cat dog horse' | raku -ne 'my @a = .words; for @a { say "$_ ", ($/ given .match(/ .$/))};'
cat ｢t｣
dog ｢g｣
horse ｢e｣
admin@mbp ~ % echo 'cat dog horse' | raku -ne 'my @a = .words; for @a { say "$_ ", $/ given .match(/ .$/) };'
t ｢t｣
g ｢g｣
e ｢e｣

Most languages start from the premise that if you enter minimal code, you accept the defaults the language gives you (including operator precedence). By adding extra named-arguments, you can customize your output. Finally, if default operator precedence is a problem, you use the ultimate "nail clipping" solution of parenthesizing everything.

The concept of "magic" upsets the apple-cart: trivial things become difficult, and vice versa. Your concept of "magic" will most certainly not match my concept of magic. Plus, if you design you language "magic" to appeal to a hypothetical population (lets say...Perl programmers), then you end up confusing/frustrating those outside your putative audience. For some (part-time, scientific, data-sciency) programmers, this is a signal that it's time to walk away.

raiph commented 6 months ago

@jubilatious1

It's an undeniable fact that if you remove the parens I referred to, you don't get the desired two-column "word, last-letter" output

Sure. (I always run code that anyone shares as the very first thing I do as part of trying to help them.)

Most languages start from the premise that if you enter minimal code, you accept the defaults the language gives you (including operator precedence).

Sure.

Thus, for example, when you wrote for foo { say bar, baz given qux }, the "it" variable ($_) established by the for foo was shadowed by the "it" variable from the bar, baz given qux because the latter took precedence in accord with Raku's rules.

By adding extra named-arguments, you can customize your output.

Do you mean it can make code easier to understand if you break data out and store it in appropriately named variables? Assuming so, then yes, that sometimes makes sense.

Finally, if default operator precedence is a problem, you use the ultimate "nail clipping" solution of parenthesizing everything.

While using nail clippings may be what you prefer, I generally prefer to minimize parens that add to noise and instead look for a different solution, and I know a lot of Rakoons have the same preference.

That's why, when you wrote "I have to parens the match clause to get it to work" I thought you would be pleased to read that you don't have to.

But now I'm confused. Would you prefer not to read comments suggesting how you can solve problems in simple ways after you've suggested you have to do it an ugly or complicated way?

The concept of "magic" upsets the apple-cart: trivial things become difficult, and vice versa.

Yes.

Your concept of "magic" will most certainly not match my concept of magic.

Yes. And that makes matters worse.

Plus, if you design you language "magic" to appeal to a hypothetical population (lets say...Perl programmers), then you end up confusing/frustrating those outside your putative audience.

Yes. And that makes matters worse still.

For some (part-time, scientific, data-sciency) programmers, this is a signal that it's time to walk away.

Yes.

But imo, even more important than all the above negatives that can be associated with "magic", is when some magic provides tiny benefits and massive problems.

At that point it becomes almost absurd for a PL to continue to support those particular forms of magic.

And at a major PL version break like 6.e we get the opportunity to clean up the language without breaking old code. So why don't we at least consider doing just that?

jubilatious1 commented 6 months ago

@patrickbkr wrote:

However, there are a few strange things happening here. If we remove the m from m/ \w+ / ... .

Could you provide version numbers? And when/which-version you saw the change? Thx.

jubilatious1 commented 6 months ago

@raiph wrote:

So why don't we at least consider doing just that?

~ % raku -e 'for lines() { put join "\t", $_, $/, $0 if m/ ^ aa(.) / };' /usr/share/dict/words
aal aal l
aalii   aal l
aam aam m
aardvark    aar r
aardwolf    aar r

raiph commented 6 months ago

@jubilatious1

The piece of my comment you quoted has no relevance to the code you showed. I suspect you have misunderstood what I'm asking you to think about.

This issue discusses some technical problems that have long threatened to destroy the entire language. Do you recognize that?

This issue discusses options we have for solving them, options which would break backwards compatibility. Do you recognize that we need to very seriously consider them?

Almost none of what has been discussed in this thread, problems or solutions, has been about altering the behavior of the code you showed. The fact you shared that code suggests to me that you don't understand what is being discussed. Worse, you shared it without discussing specifically how it relates to specific points made in this issue.

This is part of the reason I asked you to be more specific about what you meant when you wrote:

(FYI, the bare-bones / … / matcher (without the leading m) implies to me that less work is being done by the language, not more. It's degenerate. Quote-unquote magic and magical actions should be reduced to a minimum.)

What do you mean? "Quote-unquote magic and magical actions should be reduced to a minimum." is fair enough provided you explain what you define as "magic". Until you do that it's ultimately unhelpful at best, confusing at worst.

Furthermore, comments about "magic" are typically used to argue against features such as / ... / setting the caller's $/. But you seem to be arguing for that magic, and for that magic to be preserved in 6.e. And both of those arguments are fair enough, and consistent with what just about everyone commenting in this thread has said may well be fair enough. So why introduce discussion of magic in either direction if you refuse to be specific about what specific instances of magic you're talking about and whether you are for or against them? Such vagueness doesn't help.

To help show that you understand (or don't) particular aspects of what is being discussed in this issue, please comment on your specific position relative to each of the following numbered aspects, and explain what impact you think each aspect would have on the code you just shared, if you think it would in fact have an impact:

The issue Liz originally raised when she opened this issue, which was about .parse, .subparse, and .parsefile magically setting the nearest my $/ declaration in the call stack. Her commit disabled that in 6.e.
Other methods like .match and .subst setting the nearest my $/ declaration in the call stack. She has suggested that we might also want to disable them doing that in 6.e.
A 6.e pragma that enables/disables this magic setting of the nearest my $/ declaration in the call stack. If we introduce such a pragma, does it default to disabled or enabled?
The aspect Liz mentioned in her commit message, namely adding a my $/; to the start of any block containing any literal mention of $/ or cousins ($<...> or $0, $1, etc. Do we add that to 6.e?
A 6.e pragma that enables/disables the adding of a my $/; to the start of any block containing literal mentions of $/ and cousins. If we introduce such a pragma, does it default to disabled or enabled?

TIA for following up. We need to make progress on dealing with thread (un)safety in Raku, and we need to take the opportunity of 6.e to confront things that will kill Raku if we're not bold enough.

That doesn't mean we break backwards compatibility willy nilly. But it also means we're not going to maintain backwards compatibility just because some users don't yet understand what the problems are. We need to talk this stuff through and get on the same page, or move ahead to secure the language's future, and this includes things as fundamental as thread safety of basic features.

Raku / problem-solving

Unclear sharing of `$/` causing race conditions #406