Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Shared namespace between methods and a grammar's regexes causes name collision #401

Open codesections opened 1 year ago

codesections commented 1 year ago

Raku's current design (as discussed pervasively in S05) is that, inside of a grammar, the syntax token foo {…} inside a grammar causes a method foo to be installed in that grammar (just as the syntax method foo {…} inside of a class or grammar causes a method to be installed). That is,

grammar G { 
    token TOP { <.alpha> || <.punct> }
}

Is equivalent to

grammar G {
    method TOP { self.alpha || self.punct }
}

And what I just said about token applies equally to regex and rule (all of which I'm going to call regexes from now on).

Because regexes are methods, an they follow the normal method resolution order; in particular, if a regex declared in a grammar uses the same name as any method that the grammar inherits, then it is earlier in the MRO. Thus, attempts to call the method would result in calls to the regex instead. This can be confusing, especially to users who do not realize that regexes are just methods with a different syntax.

codesections commented 1 year ago

As a note, I don't personally believe that this problem justifies any design changes to Raku; I think that the confusion can be minimized by being careful that implementations (e.g., Rakudo) are explicit about which methods they intend to call. Once that's done, I believe that the benefits of the current design in terms of simplicity/unification significantly outweigh any lingering confusion.

However, I recognize that not everyone agrees; I'm opening this PS issue so that we can have a place to discuss the design issues separate from attempts to fix bugs in the implementation of the current design (e.g., rakudo/rakudo#5468).

One potential solution that has been proposed is to have regexes install methods with some name prefix, following the model of operators (that is, give them a prefix that plays a similar namespacing-role as the infix in infix:<+>). I am unsure how, under this proposal, a user could call a method from inside a regex (e.g., token b-or-error { b || <error("No 'b'")> }, but I imagine that discussion could generate some ideas. This would, however, break any existing code of that sort and thus would need to wait for a language release.

raiph commented 1 year ago

I'm currently confused by this issue.

Shared namespace between methods and a grammar's regexes causes name collision

The namespace is deliberately the same, so that regexs, tokens, rules, and methods are indistinguishable from a callers point of view.

Thus, for example, <ident> calls the ident rule. This is currently declared with method, presumably for speed, but might one day be switched to being declared with token, and other rules might go from token to method. Grammars making use of these rules will be oblivious, because it's no business of the caller knowing an implementation detail such as whether the rule it's calling is declared with method or token, nor where it's declared. It just follows the MRO and ends up where it ends up.

defining regexes (including tokens/rules) in a grammar causes a method of the same name to be installed in that grammar

You make it sound like they're not all methods. Rules are methods.

(Though of course rules are not methods and vice-versa.)

I think that the confusion can be minimized by being careful that implementations (e.g., Rakudo) are explicit about which methods they intend to call

What do you mean by explicit?

In the Rakudo compiler, the Match class does NQPMatchRole which contains a method ident. But if a user defines their own ident then an <ident> will call that instead. This is by design.

One potential solution that has been proposed is to have regexes install methods with some name prefix, following the model of operators

OMG.

niner commented 1 year ago

Seems to me that there is a set of methods that are used internally while also forming part of Match's interface, e.g. from or orig. How about simply warning if these get re-defined as a token/regex/rule in a grammar? If it was intentional and you know what you are doing, you can simply disable the warning. Otherwise you'll hopefully realize your mistake the first time you let the compiler see your code. Having to disable the warning would also be a useful sign to a human reader of the grammar, that it does something rather special.

codesections commented 1 year ago

The namespace is deliberately the same, so that regexs, tokens, rules, and methods are indistinguishable from a callers point of view.

Thus, for example, <ident> calls the ident rule. … It just follows the MRO and ends up where it ends up.

I agree with all this and agree that it's a good design.

You make it sound like they're not all methods.

That wasn't intentional; I was trying to explain that they're just different syntaxes; i.e., that using the regex/rule/token keyword inside a grammar causes a method to be installed in the class in exactly the same way that using the method keyword in a class/grammar does. I've edited my comment to (hopefully) clarify that point.

I think that the confusion can be minimized by being careful that implementations (e.g., Rakudo) are explicit about which methods they intend to call

What do you mean by explicit?

I mean that anyone writing code in Match/etc should think carefully about whether their intention is to call the last method in the MRO (including one that might be defined in a subclass) or whether they intend to call a specific method. (This is good idea in any code that might be subclassed, of course, but imo is especially important in Match/etc.).

In some places, the code should be calling the last method in the MRO – your discussion of ident provides a good example. But in other cases, it shouldn't. For example, Match.raku currently calls self.from to display the start index of the Match. In that code, the intent is pretty clearly to call the specific from method that returns the start index, not to call the final from method in the MRO (which may well be a regex). Imo, code like that should be rewritten to self.Match.from or similar so that it calls the intended method even when a subclass implements a from method.

@niner wrote:

Seems to me that there is a set of methods that are used internally while also forming part of Match's interface, e.g. from or orig. How about simply warning if these get re-defined as a token/regex/rule in a grammar?

As explained above, I would prefer to fix Match's internals so that they work correctly even in the presence of a from method in a subclass. In terms of a grammar's external interface, a from method in a subclass shouldn't be a problem; in that case G.from would call the subclass's from, but G.Match::from still provides access to the Match's from method.

2colours commented 1 year ago

The namespace is deliberately the same, so that regexs, tokens, rules, and methods are indistinguishable from a callers point of view.

It feels like there should be some justification as to why this is a good thing, though. Conceptually, there is a very clear distinction between defining entities that are used specifically for parsing textual content, and methods corresponding to a data type. There seems to be zero reason to allow for a token called "raku" to fall back to the .raku method of whatever data type, or a token called "from" to fall back to the .from method of Match in particular.

This is currently declared with method, presumably for speed, but might one day be switched to being declared with token, and other rules might go from token to method.

This is not an argument for the namespace sharing, only for implementability as methods, which the suggestion about prefixed names is compatible with.

it's no business of the caller knowing an implementation detail such as whether the rule it's calling is declared with method or token, nor where it's declared.

I think this is the same statement as the first one, and it would still require justification. Why would it be an implementation detail whether you call a real usual method or something that is meant to comply the regex interface? The implementation detail is that regexes somehow turn into methods under the hood, not the conceptual distinction. Actually, these methods that are really only there for grammar purposes, already pollute the objects: $match.print overloads Any.print in a completely incompatible way, that's not an "implementation detail" but a problem in a public API.

As explained above, I would prefer to fix Match's internals so that they work correctly even in the presence of a from method in a subclass. In terms of a grammar's external interface, a from method in a subclass shouldn't be a problem; in that case G.from would call the subclass's from, but G.Match::from still provides access to the Match's from method.

This is a good point and actually a broader problem with the whole "it's just good old naive method calls" approach. The resolution probably already needs to be tinkered with in order to get some reasonable behavior, it is not just usual method resolution, this is where I'd reflect to this one:

I am unsure how, under this proposal, a user could call a method from inside a regex (e.g., token b-or-error { b || <error("No 'b'")> }, but I imagine that discussion could generate some ideas.

It probably never should have been the same syntax, since as it can be seen, with regex stuff, you would like to fix the resolution to the built-in "methods" while with normal methods there seems to be no point. Again, I don't know what justifies the pretension that calling an error method and matching a regex are the same thing. If you want to call a method imperatively, perhaps you should use an imperative block inside your regex which you can do in Raku.

Anyway, if we cannot even agree that $match.print doing something completely unrelated to Any.print is a problem then probably there is no point; that's my personal entry point for this issue to have any sense.

niner commented 1 year ago

Again, I don't know what justifies the pretension that calling an error method and matching a regex are the same thing. If you want to call a method imperatively, perhaps you should use an imperative block inside your regex which you can do in Raku.

What if the method is not just an error path but would do some actual matching like https://github.com/niner/Inline-Perl5/blob/master/lib/v5-inline.pm6#L6? That would lead to the strange situation that a rule can refer to a regex, a rule, a token or a method but only if it's a method the name must be different. That would become a bit of a sore point in a language that tries hard to be consistent.

Anyway, if we cannot even agree that $match.print doing something completely unrelated to Any.print is a problem then probably there is no point; that's my personal entry point for this issue to have any sense.

Usually I would agree. It's just that every user object inherits from Any. And those that bring their own print methods or their own version of the plethora of methods that Any supports, don't necessarily carry the same meaning as those in Any (or Mu). E.g. on a Net::CUPS object, print may actually mean print to paper. While normally it's a no-go to change the meaning of a method in a subclass, Mu and Any are a bit special since they already hog so many premium names.

2colours commented 1 year ago

What if the method is not just an error path but would do some actual matching like https://github.com/niner/Inline-Perl5/blob/master/lib/v5-inline.pm6#L6? That would lead to the strange situation that a rule can refer to a regex, a rule, a token or a method but only if it's a method the name must be different. That would become a bit of a sore point in a language that tries hard to be consistent.

I don't really grasp the use of the word "consistent" here. I can't see why that would not be consistent with the handling of operators, for example. Yes, you have to change the name of your subroutine add to prefix:<add> if you want it to be a prefix operator. It really is a different thing, like here as well. It rather should raise a flag if one only wants the name to be the same for, well, mere laziness, or reluctance to acknowledge that these "methods" are a bit different from the methods an object has for usual method purposes.

Usually I would agree. It's just that every user object inherits from Any. And those that bring their own print methods or their own version of the plethora of methods that Any supports, don't necessarily carry the same meaning as those in Any (or Mu). E.g. on a Net::CUPS object, print may actually mean print to paper. While normally it's a no-go to change the meaning of a method in a subclass, Mu and Any are a bit special since they already hog so many premium names.

The bloated interface Any provides could be a topic on its own but I think it's a huge difference that here the core language conflicts with itself, for no other reason than to be able to use the name "print" for a character class (iirc). We have a method in the core, with the name of a public core interface that not only utterly contradicts the semantics but actually users would rarely ever want to call it on their own. What is it doing with the public core API, then?
Even PRINT would be a better name but since names of the regex and the code called in the background are currently tied, it would have immediate implications and the same breakages. This is why I rather suggested prefixed names a la operators. Breaking change, sure, but at least a "once and for all" kind of solution, and one that only uses already existing principles, and one that seems technically feasible.

Anyway, I agree with @codesections that the details can be refined and clarified (what is worth doing and all) if (and only if) there is a mutually recognized issue in the first place. For now, I think it is enough to figure out at least that. I don't want to go up against the wall if people are as agreeing/interested in this issue as with the custom overloads of coercive operators.

codesections commented 1 year ago

The namespace is deliberately the same, so that regexs, tokens, rules, and methods are indistinguishable from a callers point of view.

It feels like there should be some justification as to why this is a good thing, though.

@2colours, that's an entirely fair request. Here is an extended example, taken from the book Parsing with Perl 6 Regexes and Grammars:

grammar MathExpression {
    method parse($target, |c) {
        my $*HIGHWATER = 0;
        my $*LASTRULE;
        my $match = callsame;
        self.error($target) unless $match;
        return $match;
    }
    token TOP    { <sum> }
    rule sum     { <product>+ %  '+' }
    rule product { <term>+ % '*' }
    rule term    { <number> | <group> }
    rule group   {
        '(' <sum> ')'
    }
    token number        { \d+ }
    method ws()  {
        if self.pos > $*HIGHWATER {
            $*HIGHWATER = self.pos;
            $*LASTRULE = callframe(1).code.name;
        }
        callsame;
    }

    method error($target) {
        my $parsed = $target.substr(0, $*HIGHWATER).trim-trailing;
        my $line-no = $parsed.lines.elems;
        my $msg = "Cannot parse mathematical expression";
        $msg ~= "; error in rule $*LASTRULE" if $*LASTRULE;
        die "$msg at line $line-no";
    }
}

say MathExpression.parse("1 + ");

This code defines a custom ws method that it uses to track the farthest point that the grammar successfully matched (very useful for helpful error messages). And to do so, it's essential that ws method be part of the same dispatch chain as ws token – otherwise callsame wouldn't dispatch to the existing ws.

This is why I rather suggested prefixed names a la operators. Breaking change, sure, but at least a "once and for all" kind of solution, and one that only uses already existing principles, and one that seems technically feasible.

When I first heard this suggestion, I was fairly skeptical. But, as we've continued the discussion, I'm starting to come around to it. Let me flesh out one way it could work:

This proposal would preserve the semantic unification – token's would still just be methods, just with funny names (cf. S01, "operators are just functions with funny names and syntax."). And anyone could still call them via method syntax (Match.rx:<foo>). But, as @2colours has pointed out, they wouldn't clash with methods that have the same (non-prefixed) name.

And I believe this would avoid problems with scenarios like the Inline::Perl5 example @niner posted. If you want p5code to be callable in a regex via <p5code>, then you'd name it rx:<p5code>. (And, if for some reason you didn't, then calling code would still have the "imperative block" escape hatch @2colours mentioned earlier.

I'm still not 100% sold on this change – it's a fairly significant breaking change, and I still believe that 90%+ of the existing name-collision problems could be solved by fixing the match-processing code so that it calls Match methods instead of the last method in the mro.

But I'm currently at least tentatively inclined to think that @2colours idea changing to an operator-like "funny name" for regex methods would be an improvement. I look forward to hearing other's views.

raiph commented 1 year ago

I do not have the time and mental focus to keep up with this exchange with adequate quality comments. I have around 20 hours worth of thinking and writing in gists etc. in response to this cluster of issues, anticipating, for example, the things @niner mentions in his comments. But I worry that my words will be misunderstood, and there will be demands for justifications that are there in the 20 years of previous discussions and design docs but take enormous effort to search and summarize, and I just don't have the heart to get into that. But rather than end up posting absolutely nothing, I want to say something about the consequences that currently concern me if decisions at this grand scale are made without understanding what's at stake.

I think Raku's unifications of regexes and methods is of consequence. I think it is a crown jewel of the Raku vision and design in which it is intended to be a core tool to mix with other PLs. It relates not only to the more obvious aspects of its PL interop strategy, as born out internally, in its slang concept, in all Raku PLs sharing a single semantic model, in Larry's talk about regexes and GPLs being peers, and externally, in @niner's foreign language adaptors, but also how its regex/grammar engine is by far more relevant outside Raku than in.

Many PLs support methods. Many PLs gain great power if they have good regex/grammar capabilities. Part of the point of Raku was to evolve into a partner with other PLs, including: slangs developed in Raku land; existing PLs like Python or C or Perl; and new PLs that might deliberately build on what Raku offers to them as a tool, just as PLs 20 years ago leveraged PCRE.

A part of all this is what is a function, what is a method, what is a regex, what is a parser combinator.

Another part is the name of functions, the name of methods, the name of regexes*.

If Raku is to have a smooth interop story that lets other PLs view it as a tool worth integrating with, it needs the basics of integration to be low friction, and naming is a big deal. C++ has name wrangling, and that shouldn't be a big deal, and with sufficient effort it isn't. But "sufficient effort" can be viewed as a euphemism for "friction". We don't want to introduce friction without understanding the consequences for the long term health and possibilities of Raku.


That all said, yes, we need something ergonomic outside of these high falutin' future possibilities.

For example, <print> clashing with .print is not ergonomic.

But the latter is just an inevitable consequence of object orientation that isn't fully statically typed (which is to say, any OO that includes what makes OO especially effective in many practical programming scenarios).


Another key thing relevant to this discussion is the role of surprise -- the meaning of a WAT. A WAT is NOT a bad thing. It's a WAT. It's not necessarily a good thing either, but the right kinds and numbers of WATs can be extraordinarily good. This is backed by brain science results, though I do not have the time and energy to write about that here tonight. But in short, <print> colliding with .print is arguably a fantastic teachable moment, and we do not want to go to great effort to eliminate excellent teachable moments, especially not due to ignorance, because eliminating excellent teachable moments would make Raku harder to learn and use, not easier to learn and use.


I wish I had the time to make my comments shorter. The above was written in a hurry and may be full of errors. Please feel free to delete this comment if you think that would be beneficial to making this issue thread productive.

lizmat commented 1 year ago

Match's interface, e.g. from or orig

These should be covered by https://github.com/rakudo/rakudo/commit/f2c394120f now.

The problem is within NQPMatchRole, specifically when it is internally calling methods itself defines. One such case was the method Str. I think I got all cases covered now: the fix is really to provide that method verbatim as part of the Match class.

Another set of problems I believe is caused by the regex engine shortcutting method calls on what it thinks are its own "tokens", probably because of performance reasons, such as <ws> https://github.com/rakudo/rakudo/issues/1222 This will require further research.

codesections commented 1 year ago

@raiph wrote:

I do not have the time and mental focus to keep up with this exchange with adequate quality comments. … I want to say something about the consequences that currently concern me if decisions at this grand scale are made without understanding what's at stake.

Please take your time/don't feel rushed by the pace of this discussion. Raku has used the current design for years at this point, and we're not about to rush into a design change before exploring the solution space and reaching consensus that we've found a real improvement.

Design changes – especially breaking ones, and double-especially breaking ones for mature languages, used in production – definitely demand careful thought. It's nearly always better to take longer to introduce the right feature than to introduce the wrong feature right now. So I look forward to hearing your more considered thoughts, whenever you have the time/focus to share them ☺

2colours commented 1 year ago

This code defines a custom ws method that it uses to track the farthest point that the grammar successfully matched (very useful for helpful error messages). And to do so, it's essential that ws method be part of the same dispatch chain as ws token – otherwise callsame wouldn't dispatch to the existing ws.

This is still not an argument to share namespace with methods like from, to, raku and so on. It's just an argument for allowing method definitions for tokens with any names, and for that, having a form like method token:<ws> would be an equally good solution.

I have this impression that the interface and the implementation get mixed up in this reasoning. What you want (and I agree that it is a valid demand) is to be able to define a regexish thing imperatively, so that it can take the same resolution as other regexish things. (Actually, you don't even seem to want this resolution to be the same as generic method resolution.) What gets named instead is one particular way to implement it: the method naming convention (ie. the names are the same for methods and regexish things). For me, the latter is an interface problem. Any sort of "squatting" in either direction with the names seems to require justification, and so far I haven't received an answer to that.

Personally, it really makes a big difference for me if something is the way it is because somebody hasn't thought of the problems - or maybe considered the problems irrelevant - or saw some merit to this approach that I haven't thought of. I'm not sold by the sentiment that in a self-proposed "100 years language" that has really conservative ways to keep old code working, there cannot be breaking changes to anything, even if it might that turn out to be simply sloppy thinking... even PHP got to the point where they started radically eliminating messy stuff.

I also feel that there needs to be a way to address this phenomenon, without turning it into personal beef, that during a lot of discussions, while some of us try to connect the topic to some sort of objective evidence (e.g what happens for certain code, how could it work differently, etc.), there are regularly these "ambient comments" that contain something so broad and vague that they are basically unaddressable but they do shape the way the discussion goes. They remind me of the Chewbacca defense. And when these comments happen to imply that a particularly well-reasoned thing is being challenged hastily, after the same person summarized my suggestion with a mere "OMG", without bringing any sort of evidence to any of these, well, I don't know... Do you all think this is a constructive way to solve problems? I could suggest that I start doing the same "to make it fair" but I would rather suggest to stay away from these kinds of arguments... There should be another way.

lizmat commented 1 year ago

This is still not an argument to share namespace with methods like from, to, raku and so on.

FWIW, I think these are all fixed with f2c394120f2c4821b437 . If not, please provide a test-case.

The <ws> case is different. It is caused by special handling inside NQP: https://github.com/Raku/nqp/blob/main/src/QRegex/NFA.nqp#L376-L383 .

2colours commented 1 year ago

FWIW, I think these are all fixed with f2c394120f2c4821b437 . If not, please provide a test-case.

So that it doesn't go unnoticed. We talked this through, $match.print is still a valid example.

raiph commented 1 year ago

$match.print is still a valid example

The following are profoundly different categories of issues people seem to think exist:


I view each category of "collision" outlined above as having good and bad aspects. I think @2colours, and perhaps others, view some or all of them as just bad, with no good aspects. The latter was part of my motivation for writing this comment, so we can head off problematic "and so on"s. I hope everyone, @2colours included, recognizes that for the time being these "collision" categories should be discussed separately, but that it's premature to open separate issues. Agreed?

raiph commented 1 year ago

$match.print is still a valid example

As hinted above, I see this as at most in a small category, and quite likely its own category.

I say that because I've looked for others in the same category, where by "category" I mean the kind of categorization I outlined in my previous comment, and haven't seen one yet. I may of course have missed some, and others may of course arrive in years to come. But for now I consider print to be a specific issue all of its own.

I say "for now" to mean the main branch of Rakudo as it is today.

Until a week or two ago, it wasn't it's own thing. There was a long standing bug in which the name of any method defined in Mu or Any leaked into use of grammars due to the default argument for the actions named parameter of the parse method of the Grammar class being Any. This resulted in utterly bizarre errors. Liz has since fixed that bug and I presume we no longer need to discuss the myriad problems that bug caused.

raiph commented 1 year ago

$match.print is still a valid example

So what is the deal with <print> and $match.print?

Rather than discuss any negative or positive view of that "collision", this comment will focus on a neutral view of it.

(In a following comment I will discuss a positive view of it, based on a broad principle I plan to more fully introduce to Rakoons in the discussion elsewhere about infix + overloads.)

So, again, what is the deal with <print> and $match.print?

Well, first and foremost, the deal is that there is a deal, as follows:

  1. We could argue about whether it would be better named, say, printable, or token:print, but the bottom line is that we know that, if someone writes <print> in a grammar, their intent is to match a printable character.

  2. We could argue about whether it would be better written $match.Match::print, but the bottom line is that we know that, if someone writes $match.print in MAIN code outside a grammar, their intent is to print the match object.

  3. We could argue about whether it would be better written $match.token:print, but the bottom line is that we know that, if someone writes $match.print in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <print>) in that MAIN code.

  4. We could argue about whether the foregoing is a bad thing, a good thing, or a mix, but the bottom line is that it can lead to a WAT. Arguably a Mega WAT. And that is the deal.

lizmat commented 1 year ago

FWIW, I think I can come up with a solution, but I want to do that after the 2023.11 release, as it will involve some pretty deep changes that may have ecosystem effects. So I don't want to do that 1 day before the release.

2colours commented 1 year ago

The following are profoundly different categories of issues people seem to think exist:

*

Honestly, I don't get the premise. Who "seem to think" these categories exist, and how did you conclude that? What are we categorizing at all? It seems you are categorizing the names themselves, rather than some backing concepts, correct me if I'm wrong about that.

For me, the very principle that you can arbitrarily categorize names that live in the same namespace, simply by their content, is kind of a problem, or at least it's going too far in this situation. Like, of course you can say that methods with math-y names will probably operate on math-y data types and they will be some math-y transformation; that's all semantics. However, it's not all semantics that you have grammar processors with their custom declarators and custom slang and special meaning inside other grammar processors - and they just march around as usual methods. That's a whole different specific interface - much like operators versus subroutines.

For the points you have raised for the print collision, I think most of them are highly uncontroversial; although for me the actual collision is more of a "last nail in the coffin" kind of thing, much like a real-life illustration of how wicked the principle is anyway. There is one I didn't agree with, though:

3. We could argue about whether it would be better written $match.token:print, but the bottom line is that we know that, if someone writes $match.print in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <print>) in that MAIN code.

I think we are hitting the barriers of "let's agree to disagree" with this. Your premise is that something like <print> is as true of a method as it gets while my premise is that it's only technically a method, conceptually being a thing on its own. For you, it may seem obvious that $match.print should refer to the character class inside the code of a grammar - for me, it's actually confusing and unjustified. I not only "don't know" that this is what somebody would mean, I kind of protest against this behavior. If anybody really wants to invoke the character class imperatively, by all means, let them just write $match.token:print, instead of trying to be clever and context-sensitive with the meaning of $match.print.

raiph commented 1 year ago

The following are profoundly different categories of issues people seem to think exist:

Honestly, I don't get the premise.

I thought it might do good, and couldn't possibly do harm, to attempt to neutrally distill a small number of categories that cover the myriad backing concepts that have been introduced by those commenting in this thread.

Who "seem to think" these categories exist

Apologies for the ambiguity in my sentence. Let me rewrite it to see if that helps:

The following are profoundly different categories (of issues people seem to think exist).

and how did you conclude that?

I read their comments.

What are we categorizing at all?

The issues that people seem to think exist.

It seems you are categorizing the names themselves

Nope.

rather than some backing concepts

Everyone made a point of providing backing concepts, if you read their comments in full and used appropriate inference.

codesections'

"[explicit lists of methods that] "implementations (e.g., Rakudo) ...intend to call"

"indistinguishable from a callers point of view" ... [example ws]

niner:

"set of methods that are used internally while also forming part of Match's interface, e.g. from or orig."

"[methods that do] some actual matching like https://github.com/niner/Inline-Perl5/blob/master/lib/v5-inline.pm6#L6"

2colours:

"[methods] specifically for parsing textual content"

"methods corresponding to a data type"

"something that is meant to comply the regex interface"

lizmat:

"Match's interface, e.g. from or orig"

"NQPMatchRole ... internally calling methods itself defines. One such case was the method Str"

"method calls on what [regex engine] thinks are its own "tokens", probably because of performance reasons"

Me:

"indistinguishable from a callers point of view" ... [example ident]

"Match's internals"

"raku" [and other methods specified outside of grammars that are not also specified as pattern matching methods]

"print" [and other methods specified outside of grammars that are also specified as pattern matching methods]

"ws" [and other methods specified as pattern matching methods, used a lot in grammars, and documented as overridable]


  1. We could argue about whether it would be better written $match.token:print, but the bottom line is that we know that, if someone writes $match.print in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <print>) in that MAIN code.

I think we are hitting the barriers of "let's agree to disagree" with this.

I think I can see how you've misunderstood what I wrote.

As I wrote "We could argue about whether it would be better written $match.token:print."

So let's have that argument momentarily and go down the path that we went with the token:print name.

Do you agree that if someone wrote $match.token:print "in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <print>) in that MAIN code."?

Hmm. Maybe you're going to misunderstand <print>. Sigh. Again, let's paint the bikeshed another color for a mo:

Do you agree that if someone wrote $match.token:print "in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <token:print>) in that MAIN code."?

Now, if you simply replace the color the bikeshed is in our imaginary argument with the color it's actually currently painted, and imagine that someone knows that rules are just methods, and that they're looking at the code $match.print, then do you agree that they would get that the writers intent was to invoke the printable character assertion?

(Or is your mind so fixated on the paint being the color you imagine it should be, that you are incapable of imagining that the color is actually the color it actually is?)

Your premise is that something like <print> is as true of a method as it gets while my premise is that it's only technically a method, conceptually being a thing on its own.

When did I ever say I see no distinction between a method whose purpose is to match text and a method whose purpose isn't to match text?

For you, it may seem obvious that $match.print should refer to the character class inside the code of a grammar

Not to a newbie, no. Not to someone who doesn't know they're all methods, no. Not to someone who has lost sight of that fact, no. Of course it won't be obvious to them/us if we're in any of those boats.

But the same is true of $match.print meaning calling the method .print on the $match object to someone who knows no programming language. It won't be obvious.

for me, it's actually confusing and unjustified.

To you? A newbie? Who is confused? Who are we supposed to be justifying it for?

let them just write $match.token:print, instead of trying to be clever and context-sensitive with the meaning of $match.print.

Who has ever been clever and context-sensitive with the meaning of $match.print? Part of the point of me breaking out what seems to be peoples' issues is how utterly absurd some of the scenarios are. Have you ever heard of someone writing $match.print`? Not as a "Oh, look what I figured out. Wouldn't that be confusing?" But someone being too clever and context-sensitive for their own good.

I get the distinct impression that a lot of this is DIHWIDT nonsense. Unless you show me code in the wild where someone has done such a dumb thing as to write $match.print to mean invoking the character class matching assertion, I will refuse to believe anyone has, and will think that if anyone has done such a dumb thing, it's their problem, not ours. There are limits to the absurdities we need to care about, and this would be one of them.

2colours commented 1 year ago

Everyone made a point of providing backing concepts, if you read their comments in full and used appropriate inference.

Well then, it might be that I'm uncapable of "reading comments in full and using appropriate inference". I still don't see how the one single phenomenon that things that behave like a token are simply exposed as a particular method of the grammar (possibly via inheritance from Match or some role) turns into 3 or 4 different phenomena. I don't know why it makes a difference whether it's inherited from here or there, or whether it is called ws or print, or even whether it is actually implemented using the token keyword, the regex keyword, or directly as a method that respects the internal behavior of these.

For me, there has been two different concepts all along: one of them are "apparent methods" that the users use for simple OO reasons, like raku or to or from or print (on Any) or sum or... I think you get the idea. The other one is "apparent regex" (token/rule/regex, doesn't matter) that people use for parsing purposes, mostly within grammars, and we could say that whole topic has its own API. For the former ones, it is the very purpose that you give them a name and use them by their names in method calls; for the latter ones, I think it is merely an implementation detail that token ws gets translated to method ws somewhere on the way, and I don't think this cross-contamination is a good API.

There are two interaction scenarios between the two concepts:

  1. somebody wants to call an "apparent method" as if it was an "apparent token" This could be illustrated by a regex like /<raku>/. I think this is a big fat DIHWIDT, a "please don't want to do this" kind of thing. You wouldn't want to call a method as a "token" unless you are sure it really is a "token", behavior-wise. And for that, I think it even helps if everything that really is a "token" lives in a token:-prefixed scope.
  2. somebody wants to call an "apparent token" as if it was an "apparent method" An average user probably wouldn't use this either but I think it is fair for lower levels of implementing some complex parsing mechanism. And for this, it really doesn't seem required whatsoever that the mapping from the "token name" to the backing "method name" is a mere identity. The prefix would do no harm at least, it would just clarify the code - much like a sigil does, honestly.

As I wrote "We could argue about whether it would be better written $match.token:print."

I also didn't want to "argue about that". You said something in the next sentence as an obvious fact, and I disagreed that it was obvious or desirable even. I just said that if somebody indeed wants the behavior that you claimed to be the obvious intent, they really should write, well, something else instead. Whether it is .token:print or not.

Do you agree that if someone wrote $match.token:print "in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <print>) in that MAIN code."?

Yes, with that, I do agree. If they expressed explicit intent to call an "apparent token" as an "apparent method", that is fine.

Do you agree that if someone wrote $match.token:print "in MAIN code within a grammar (either a closure inside a regex, or a method within the grammar), their intent is to invoke the printable character assertion (the same as <token:print>) in that MAIN code."?

I see what you did there but I think <print> already expresses the desire to add an "apparent token" into your declarative grammar code. I can't see what value adding token: would add here. The added value is rather to not accept <raku>, unless the user created an "apparent token" that has the name raku.

and imagine that someone knows that rules are just methods

I don't think this becomes okay just by "knowing it". I have known this for quite a while and I still think two distinct things are sitting on the same namespace. I don't think it should ever be a concern of a user who wants to use "apparent methods" for OO reasons that somebody else installed "apparent tokens" that squat on the same names. Or vice versa - probably that's less common, though. It seems to me when you imply that people should just learn this, you choose adding a trap entry in the documentation, over fixing a design problem for good.

When did I ever say I see no distinction between a method whose purpose is to match text and a method whose purpose isn't to match text?

So you see the distinction, and you probably also know that the former category has its own keywords and syntactic support, but you don't think this is a reason to think of them as not "just methods". Well, I cannot relate, just like I couldn't relate to the four, apparently much more finegrained categories you envisioned earlier.

Not to a newbie, no. Not to someone who doesn't know they're all methods, no. Not to someone who has lost sight of that fact, no. Of course it won't be obvious to them/us if we're in any of those boats.

But the same is true of $match.print meaning calling the method .print on the $match object to someone who knows no programming language. It won't be obvious.

Please, don't do this. Pretty please, no. How can we have a proper discussion about any design decision if you basically send everybody "back to learn" who doesn't agree with you already? As much as I have to respect and admire the effort you have been putting into spreading the word about Raku, I think what you are presenting right here and now is very damaging, and especially to beginners. A beginner will either feel silly, or get upset and walk away, if you just outright invalidate their experience. After two years of close attention and involvement, I feel confident enough in my knowledge that it doesn't really have an effect on me, I just don't take it at face value. But please, try to refrain from doing this, especially with beginners.

To you? A newbie? Who is confused? Who are we supposed to be justifying it for?

Well, I'm definitely not a newbie and not "confused" in a sense that would be relevant here - you kind of made it sound like these questions all describe me, not sure if that was the intention. But anyway, I don't think this is the right question. When I say "justification", I don't mean a "popular demand" kind of thing - I mean an objective, comprehensible reasoning that is the same to pretty much anybody who reads it.

Just like I can point out that the print method is a public interface coming from Any, and $match.print not only fails to provide this interface but in fact does something that is borderline not public interface-worthy, to the average user.

Who has ever been clever and context-sensitive with the meaning of $match.print? Part of the point of me breaking out what seems to be peoples' issues is how utterly absurd some of the scenarios are. Have you ever heard of someone writing $match.print`? Not as a "Oh, look what I figured out. Wouldn't that be confusing?" But someone being too clever and context-sensitive for their own good.

I get the distinct impression that a lot of this is DIHWIDT nonsense. Unless you show me code in the wild where someone has done such a dumb thing as to write $match.print to mean invoking the character class matching assertion, I will refuse to believe anyone has, and will think that if anyone has done such a dumb thing, it's their problem, not ours. There are limits to the absurdities we need to care about, and this would be one of them.

Frankly, I don't buy this, and never will. I find it overly patronizing to decide for very simple and perfectly sensible code that it's a nonsense. If you cannot imagine somebody plotting data using .print, you will probably have problems imagining most of code out there. Why do you even get to decide if this is absurd or not?

Also, it really irks me in a bad way that you need to say this to somehow neglect factual and undeniable evidence about the problems of the whole approach. If you had actual positive proof that this whole sharing namespace was somehow good, and the implicit fallbacks to usual OO methods are actually useful for something - which are the things I keep asking about in this issue - you probably wouldn't even have to force this DIHWIDT narrative to everything, because your positive argument would just carry the discussion. So far, I had a suggestion, and we can compare that suggestion with the current situation. This is a design proposal, there is no need to hurry, everybody can be as involved or uninvolved as they wish. But please, at least, let's bring on some merits besides "this is how it is and it is very good because reasons and if you don't agree, you just don't know enough".

lizmat commented 12 months ago

FWIW, I have found a simple workaround to make sure that only action methods in the action class, are being considered. Take this example:

grammar A { token TOP { <.print>+ } }
class B { }
say A.parse("foo", :actions(B));
# Cannot resolve caller print(B:U: A:D); none of these signatures matches:
#    (Mu: *%_)
#  in regex TOP ...

The workaround is to limit the MRO of the actions class:

grammar A { token TOP { <.print>+ } }
class B { 
    method ^mro($) { use nqp; nqp::list(::?CLASS) }
}
say A.parse("foo", :actions(B));
# ï½¢fooï½£

I guess that could be maybe be generalized, perhaps by creating an actions syntax, so you would be able to say:

actions B { ... }

with "actions" being an adapted "class" of which the HOW would be an adapted ClassHOW with the above mro method.

Would that make sense?

lizmat commented 12 months ago

Now available in the ecosystem as use actions

2colours commented 12 months ago

It probably makes sense but I can't see how it belongs to this topic. The actions rather belonged to https://github.com/rakudo/rakudo/issues/5468. The way I understand, that issue actually "got solved", except it had to be reverted, and now you are offering a module space solution to that.

EDIT: hm, or was it https://github.com/rakudo/rakudo/issues/5443 this issue?

2colours commented 11 months ago

Unless you show me code in the wild where someone has done such a dumb thing as to write $match.print to mean invoking the character class matching assertion, I will refuse to believe anyone has, and will think that if anyone has done such a dumb thing, it's their problem, not ours.

https://gist.github.com/jdv/70520031234f7b02274fb20197320136 https://irclogs.raku.org/raku/2023-12-14.html#14:58

I rest my case.

EDIT: I suspect even the phrasing went wrong since $match.print meaning character class assertion is the current behavior (the one that I agree nobody really meant, ever). In the previous paragraph, it was questioned whether somebody would write $match.print at all. I don't know why anyone would question that but anyway, here you go, it caught no other than the release manager of Rakudo.