Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Smartmatching to S/// is not a bug but a feature #350

Closed 2colours closed 2 years ago

2colours commented 2 years ago

'almafa' ~~ S/a/e/ (along with any smartmatch to an S/// construct) produces a "worry", even though the already existing behavior is reasonable and already has analogous precedents with other regex quoting constructs.

To provide some context regarding the phenomenon:

Here I am going to argue that the behavior should not be considered "worrying", without any modification otherwise.

  1. Smartmatching =/= given/when The core idea behind labelling smartmatch with S/// as "useless" is that when S/a/e/ {} is indeed useless. However, we do use smartmatching with completely different motives than given/when structures. The documentation is a good example: it's full of code that uses smartmatching to regexes in order to obtain the match object, for example: say "Zen Buddhists like Raku too" ~~ $regex; # OUTPUT: «「Raku」␤». If smartmatching exists (and is commonly used) outside of boolean context, why not acknowledge that smartmatching per se isn't a problem with S///?
  2. Strong analogy with smartmatch to m// or // This is mostly a refinement of the previous point. $a ~~ S/x/y/ and $a ~~ m/x/ aren't only visually similar; their behavior is strongly analogous:
    • setting the match variable $/, based on the matched pattern
    • returning according to the operation (m matches, S substitutes)
  3. The advice of the worry message ("You can use given instead") isn't convincing While ~~ is an operator that you can use as many times in an expression as you wish, given is a statement modifier that you can only use once in a statement. They are on different conceptual levels. The situation can be improved by using andthen for setting the topic variable, or using a .&{ S/// } call on something; however, if your intention with using smartmatch for substitution was clear, these all still seem workarounds, not proper alternatives.
  4. $foo ~~ S/a/b/ ~~ S/c/d/ is a good-looking paradigm :) I admit that this might be an unusual take but I definitely consider it a pro, worth mentioning. Thanks to the topic-setting nature of smartmatching (i.e the same thing that makes substitution with smartmatching possible in the first place), this chain will (although with redundant steps) execute just right, hence making S/// itself more useful. I wouldn't be surprised if the execution could be optimized almost with the same effort as the "worry" is generated; the syntax to recognize is almost the same. Mind you: this works "for free", completely respecting the properties of smartmatching - except the unwritten, implicit, already violated expectation that it should be boolean-centric.

For me, this issue was already insightful, and I think it's good to think about smartmatching for a while. I personally think that as easily as removing a "worry" and a "note" from the documentation, we could increase both consistency and fun, and that might be the easiest way out. However, there are theoretical implications (mostly about smartmatch but also about these regex "operators") that can be investigated in many different ways.

salortiz commented 2 years ago

If your real concern is the "worry" produced by the compiler, these can be turned off by a simple 'no worries` pragma.

{ no worries; say 'almafa' ~~ S/a/e/ } # False

But I don't understand your use case.

The S/// operator returns a Str, not a Regex, so when used as the RHS of the smartmatch operator ask for a string equality comparison that returns a Bool, that will be True if and only if the substitution requested becomes a NOP (i.e. nothing was substituted). Anyway the result of the substitution is lost.

Am I missing something?

2colours commented 2 years ago

If your real concern is the "worry" produced by the compiler, these can be turned off by a simple 'no worries` pragma.

I know but I don't think that's a "real solution". If something is conceptually not "worrying", it shouldn't produce worries in the first place - otherwise the whole concept of a "worry" turns useless.

I really had a hard time figuring out what you are talking about - let me ask you: have you run the snippet you posted, or any of my examples? Because what you imply is just not the real behavior. It resembles TimToady's original message that got fixed by Zoffix in 2016.

I'm not sure if it's even right to call m//, S///, s/// and the likes "operators" - mind you, they all behave in a way that contradicts your assumption here.
Your assumption seems to be that they are executed right away, with huge precedence. If that was the case, $foo ~~ m/test/ would also be useless because the right handside would downright match against the topic variable and then the smartmatch would turn into string ~~ match. Similarly with s///: it would modify the original topic variable and then end up smartmatching a string against a match.

The actual relevance of the smartmatch operator ~~ in these cases is that it sets the topic variable, which is part of its definition. The "regex operators" are only executed afterwards - that's why all these work at all.
One of my key points is that these cases of smartmatching already produce non-boolean values that are used outside of boolean context (as well). It's just that unlike S///, they are assumed to be useful in boolean context (as well).

salortiz commented 2 years ago

let me ask you: have you run the snippet you posted

Yes, sure. No boolean context at all, I use a simple say to show the return value.

The fact is that both m// and s// returns a Match if succeeds, or Nil:

my $str = 'aaaa';
say $str ~~ /o/; # Nil
say $str ~~ /a/; # 「a」
say $str ~~ s/a/x/; # 「a」 (and $str was modified)
say $str; # xaaa
say $str ~~ s/o/y/; # False (because s// fail so the expression becomes `$str ~~ Nil`)

That is so because a Match at the RHS of ~~ always return its LHS. (And foo ~~ Nil returns False)

But the S/// case is completely different, because it always returns Str (the original one if the match fails and the substituted one if succeeds) so ~~ always returns Bool,

Yes, the first step of ~~ evaluation is temporal binding the topic ($_) with its LHS, but the second step is evaluate its RHS, then (third and finally) the result of the previous step is used as the invocant of .ACCEPTS($_), determining the final result.

I put emphasis in temporal because:

$_ = 'bar'; my $str = 'aaaa'; $str ~~ /a/; say $_; # bar (the $_ binding to 'aaaa' by `~~` gone)

So, the results of the Match (regex) cases are clear, but I really don't know what you want to obtain from 'foo' ~~ S/x/y/ nor why you want to remove the "worry".

And observe that $foo ~~ S/a/b/ ~~ S/c/d/ becomes Bool ~~ Str, let me ask you: Was those substitutions useful?

2colours commented 2 years ago

Well, to be honest, I'm not sure what is going on, at this point...

https://glot.io/snippets/gf1ua1dqxk

It says "elmafa". Same with the freshly updated Raku evalbot on the IRC, same with my own instance of Raku v2022.07 on Ubuntu 20.04.

2colours commented 2 years ago

It turns out that despite the reasoning of Zoffix implied the current behavior, it only manifested this year, with https://github.com/rakudo/rakudo/commit/32401c4762a18c98b3d5b0bdd7c03b27400cb521. Well, interesting. I think it increased consistency regarding the behavior but I'm not sure anybody was fully aware of this. Maybe @vrurg you can recall something about this change?

Also @lizmat even if you are lurking, please don't edit my comments ever again. I'm not sure why anyone can do that in the first place; I think it's prone to manipulation and I can hardly think of something more disrespectful than changing one's words one-sidedly, without asking for any kind of consent.

lizmat commented 2 years ago

@2colours: you used the word "gaslighting", which in that context implied intent by @salortiz. Please don't be that disrespectful towards people trying to help you, especially an esteemed longtime community member.

I thought the removed words did not change any of the intent that you meant. If it did, I'm sorry. But if you did, I would have flagged your comment as inappropriate and it would have disrupted the discussion.

But your request is noted: I will not change any of your comments ever again.

2colours commented 2 years ago

@lizmat thank you for that. I think it's better if I am simply bearing responsibility for the words I'm using, they say it's a part of adulthood. :) Also, not to imply that I will (or even could) reform the manners of a community, but perhaps it's better to know someone's expressions in different situations; I think even the Raku conference has shown that even a dozen of people can have quite different attitudes under very similar circumstances. It can be inconvenient at times but I believe inconveniences don't have to lead to toxicity.

For the concrete scenario: I really did feel "gaslit", not the "intentional" part but the "challenged sense of sanity" part; and I think in this regard, it does make a difference that @salortiz didn't seem to reflect at all to the behavior I described and my call that his explanation doesn't match with what I experience. This made me question if I actually have seen what I have seen, or what I am getting wrong about this. Again: I'm not saying this was deliberate harm, it's probably an example of how confidence in your knowledge can affect how you communicate in a contradictory situation.

Back to my previous comment: I think "today we learned" - it's an interesting turn of events that Zoffix presented a reasoning in 2016 that still reads to me as something that applies to the current behavior, and yet, the current behavior is only a couple of months old... It would be really good to know what lead to the current behavior on the technical side, then. Maybe that would help to define what m//, S/// and s/// are, structurally: when they are executed, how they are parsed in an expression, and so on.

salortiz commented 2 years ago

I was certainly not aware of the optimization introduced by @vrurg commit and all my tests were done with an older (2021.12) version.

Cut & paste after a new build:

[sog@pilar ~]$ raku -v; raku -e "no worries; say 'alfama' ~~ S/a/e/"
Welcome to Rakudo™ v2021.12.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2021.12.
False
[sog@pilar ~]$ rakubrew switch moar-2022.07
Switching to moar-2022.07
[sog@pilar ~]$ raku -v; raku -e "no worries; say 'alfama' ~~ S/a/e/"
Welcome to Rakudo™ v2022.07.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2022.07.
elfama

Somehow that "optimization" changed the documented semantics which, I suspect, were not properly tested in roast.

Any way that change, IMO, seems broken because with 2022.07:

[sog@pilar ~]$ raku -e "no worries; say 'alfama' ~~ ($ = S/a/e/)"
False

Indeed "today we learned", already the "worry" doesn't make much sense and this isn't settled yet.

lizmat commented 2 years ago

For the concrete scenario: I really did feel "gaslit", not the "intentional" part but the "challenged sense of sanity" part;

Then you should have used words to that effect, and not have used the word "gaslighting".

\ If someone can not reproduce the results that you're seeing, even though you think that they should be the same because the setup is considered the same, then it generally just means that you've missed a variable in the experiment that did make a difference. And you should think in that direction, rather than (consciously or unconsciously) assume someone is trying to make you crazy. \

2colours commented 2 years ago

raku -e "no worries; say 'alfama' ~~ ($ = S/a/e/)" False

I wouldn't consider this "broken". "Less Than Awesome" perhaps but it makes sense at least, in my opinion. Here, it's clear that the right handside is evaluated to a string - the assignment takes care of that. Actually, even raku -e "no worries; say 'alfama' ~~ (S/a/e/)" makes sure the right handside is executed prior to the smartmatch.
What I consider interesting (but matches what you already said) is that (S/a/e/) was already executed on the left handside as the topic. I know that ~~ is a special operator (maybe even syntax special-cased?) but if it seems like an operator with a certain precedence, it shouldn't preset the topic.

Anyway, since it's a new finding that the behavior itself changed recently, I think it's worth for me to explain my "state of mind" when opening the issue.

  1. I expected the current behavior (based on the apparent analogy with s/// and m//)
  2. lizmat showed me the commits where the worry was added and then modified - the reasoning of Zoffix to me read like it confirmed the current behavior (although now I can see that perhaps this wasn't the case - it took some time...)
  3. the documentation reassured my expectation 1 (confer for example "It's common to use this operator with the ~~ smartmatch operator, as it aliases left-hand side to $_, which s/// uses." - same argument applies to S///)
  4. the documentation provides a vague argument that can be seen as a weak argument against the current behavior ("since the result is obtained as a return value, using this operator with the ~~ smartmatch operator is a mistake " - if the "return value" is propagated from the smartmatch - as it indeed is - this is not a mistake, just not bool-friendly).

Bonus issue: TR/// has the old behavior, AND it doesn't even produce a "worry", nor is there a note in the documentation about its apparent "uselessness". Now, no matter which one is the correct behavior, we have an inconsistency.

As we make tiny steps forward, I come to realize that m//, tr///, s/// and the likes were all created as immediately executing code, rather than a callback, and as such, they had some consistency. However, the way they mimicked being callbacks are very questionable and also poorly described. I think this goes way beyond whether S/// behaves right or not.

Eventually, either hacks like Str ~~ Match returns the Match need to be carefully documented - or, and I would really prefer this solution but I don't know how big the impact would be... - they need to be eliminated in the favor of "smartmatch-friendly" s///, m//, TR///, etc... everywhere.

vrurg commented 2 years ago

I don't currently have enough time to look deep into this. But what I see here is that:

  1. S/// is OKish as ~~ RHS because there is valid topic to operate upon.
  2. It is not OK that the resulting string leaks smartmatch as it is not a Match and has to be boolified.
  3. S/// isn't really useful this way because one would rather has to deliberately model a situation where the resulting string would boolify into False.

So, the current behavior is more correct correct that it isn't. The worry makes sense here. And it is a worry because there is nothing wrong about the construct, it's just barely bears any big meaning in it.

Regarding the consistency, TR/// should probably result in a warning too. At least for the moment it makes sense to me, unless I miss a point.

salortiz commented 2 years ago

@vrurg,

  1. It is not OK that the resulting string leaks smartmatch as it is not a Match and has to be boolified. So, the current behavior is more correct correct that it isn't.

Bisectable output confirms the leaks starts with your https://github.com/rakudo/rakudo/commit/32401c4762a18c98b3d5b0bdd7c03b27400cb521, so the previous behavior was more correct. Seems that the attempted optimization, in this case, not only isn't boolified, it is returning the RHS string without applying .ACCEPTS semantic.

2colours commented 2 years ago

Well I'm not happy that a possibly useful behavior is pushed back into the cage - but more so because of this remark:

It is not OK that the resulting string leaks smartmatch as it is not a Match and has to be boolified.

I'm definitely going to open an issue against the special casing of Matches - actually, I think if they stop being hacked in smartmatches, compatibility demands would push towards the smartmatch semantics I'm describing. Either way, the (otherwise undocumented) smartmatching behavior might be the biggest design mistake I've yet discovered in Raku...

jubilatious1 commented 2 years ago

I have always assumed the s/// versus S/// dichotomy was due to Perl's historical attempt at attempting to emulate BOTH sed and awk semantics under one architecture.

See: https://www.nntp.perl.org/group/perl.perl6.users/2020/05/msg8403.html

Essentially, I see this as a kludge which attempts to make uniform differences between a more-awk-like behavior which Perl6/Raku emulates well, versus the older sed-like behavior of 1). stream-processing i.e. automatic defaulting to stdin, and 2. 'autoprinting' i.e. providing a return value without an explicit print/put/say statement.

That being said, I've just upgraded to Rakudo_2022.07 and no longer see the need to add given $_ at the end of my one-liners:

~$  -ne 'S:g/foo/bar/.put given $_;'  file

...gives the same result as:

~$  -ne 'S:g/foo/bar/.put;'  file

So, something has changed: maybe the warning is no longer necessary?

jubilatious1 commented 2 years ago
~$ raku
Welcome to Rakudo™ v2022.07.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2022.07.

To exit type 'exit' or '^D'
[0] > no worries; say 'alfama' ~~ ($ = S/a/e/)
False
[0] > no worries; say 'alfama' ~~ ($_ = S/a/e/)
Cannot assign to an immutable value
  in block <unit> at <unknown file> line 1

[0] > no worries; say 'alfama' ~~ S/a/e/
elfama
[0] > no worries; say $_ if 'alfama' ~~ S/a/e/
alfama
[0] > no worries; say $_ if 'alfama' ~~ s/a/e/
Cannot modify an immutable Str (alfama)
  in block <unit> at <unknown file> line 1

[0] > no worries; say $/ if 'alfama' ~~ s/a/e/
Cannot modify an immutable Str (alfama)
  in block <unit> at <unknown file> line 1

[0] > no worries; say $/ if 'alfama' ~~ S/a/e/
「a」
[0] >

@salortiz @vrurg

vrurg commented 2 years ago

Except for no worries; say 'alfama' ~~ S/a/e/ which is now fixed on master, everything else looks correct.

2colours commented 2 years ago

So the only good thing that I opened the issue for got "fixed" so that it is useless again... Is there a way to close an issue as "deliberately did the opposite"?

2colours commented 2 years ago

https://imgflip.com/i/704v56

jubilatious1 commented 2 years ago

I'm not sure what to say @2colours . I've never read anywhere about the origins of s/// versus S/// ( or tr/// versus TR///) and the resulting assymetry in coding. What I wrote above is my own conjecture.

What I really would appreciate is a table that tells me what return value I can expect, and when I can expect it. I'm happy to contribute, to test code, but I'm not sure I'm able to write that table in the correct historical context.

vrurg commented 2 years ago

@jubilatious1 In this case smartmatch is basically equivalent to do { my $_ := <LHS>; <RHS>.ACCEPTS($_) }. <RHS> can make use of the topic variable. When <RHS> is a regex returning a Match then it is special-coded. But S/// is not of that kind because it returns a Str.

Therefore:

So, things are quite logical if we know how it works. That was one of things the original design documents of Perl 6 were trying to overcome from Perl 5 legacy: being too DWIMMY, up to the level where too many special cases were requiring too many exceptions. Probably special-casing of regexes in smartmatch hadn't been done in first place as in this case it wouldn't be creating confusion with S///. But OTOH it lets us have things like if $a ~~ /.../ { say $<submatch> } in very concise and yet readable way.

2colours commented 2 years ago

The problem isn't $a ~~ /.../, actually that one doesn't even need special casing. /.../ is just a regex with no dependencies, therefore it doesn't undergo any tricky side effects as the RHS of a smartmatch. Then we can witness a usual ACCEPTS call on that regex.

The "special casing" happens when a Match object is obtained on the RHS, so, for example m/.../ or s/../../ (tr/../../ produces an StrDistance object).

I wouldn't even call it "special casing", it's worse than that: calling ACCEPTS on a Match instance returns the instance itself, absolutely no matter what the LHS was. Nil ~~ $match returns $match, $match1 ~~ $match2 returns $match2, $string-that-has-nothing-to do-with-match1 ~~ $match1 returns $match1. For now, it's fortunately not in Roast but it's not unintended either.

I'm still not quite ready to open an issue for that. It's a big one that would either break $a ~~ m/.../ style code, or $a ~~ m/.../ kind of stuff would need to be syntax special-cased/semantically redefined indeed, much like an unintended special casing made $a ~~ S/../../ useful for a couple of Rakudo releases.

Anyway, that's what the meme targeted - that a useful regression has to go but an egregious design mistake that we can still target on the language level is taken at face value.

jubilatious1 commented 6 months ago
~$ raku
Welcome to Rakudo™ v2022.07.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2022.07.

#<SNIP>

[0] > no worries; say $/ if 'alfama' ~~ S/a/e/
「a」
[0] >

Okay, the above code doesn't work anymore. What gives?

~ % raku
Welcome to Rakudo™ v2023.05.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2023.05.

To exit type 'exit' or '^D'
[0] > no worries; say $/ if 'alfama' ~~ S/a/e/
()

@Util

lizmat commented 6 months ago

It appears https://github.com/rakudo/rakudo/commit/35b180b8c8b90c031072150e618ee40dccf23713 introduced this change.