Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate

Artistic License 2.0

70 stars 16 forks source link

"In general, the left side of `~~` should not be auto-threading" #297

Closed raiph closed 2 years ago

raiph commented 3 years ago

In general, the left side of ~~ should not be auto-threading

Larry Wall, from https://www.nntp.perl.org/group/perl.perl6.users/2021/06/msg9966.html in May.

This seems like a very serious issue.

Please read Larry's two comments, taking on board the ramifications he outlines for both optimization and correctness/craziness if we let the LHS of ~~ autothread.

Update I should have qualified that last sentence with "if the RHS is not a type object". Instead I did effectively the opposite, writing the following misleading comment, which I've struck out to reduce further confusion:

~~And then consider this extremely popular (wrong!?!) idiom:~~

where .all ~~ Foo

~~(which I recently discovered doesn't work for a lot of cases I hoped it would work for.)~~

In summary this issue was intended to be "just" about Larry's comments, treating them with the respect their due, and considering how to respond to them to fix existing problems in Rakudo hinted at in the email thread to which Larry contributed.

(Thanks to codesections for his first comment which brought my attention to the fact that Larry had specifically enabled the .all ~~ Foo idiom in 2015 and I now realize it looks like he was not arguing in his comments this year that it should be deprecated -- even if the outcome of this issue is that it does actually end up being deprecated in 6.e.)

lizmat commented 2 years ago

It appears I already created an exception to handle this in 2019 https://github.com/rakudo/rakudo/commit/b5d529c7e9 but it appears not to actually be in use.

lizmat commented 2 years ago

Whatever we decide to do, I think we will need to make that a language level change, so most likely deprecate it now for 6.e, and disallow altogether in a future version.

Some research shows that we need to decide:

do we only disallow left-hand side of ~~, or do we only throw on the .ACCEPT(Junction:D)? The latter would perhaps mean a less clear error message, or we just need to mention "smartmatching".
What do we recommend users as an alternative??

In any case, dieing in either a special infix:<~~>(Junction:D, ...) candidate, or in a method ACCEPT(Mu:D, Junction:D) candidate, both create significant spectest fallout.

lizmat commented 2 years ago

@niner @vrurg @ugexe @japhb @JJ @codesections ping :-)

lizmat commented 2 years ago

raiph commented 2 years ago

(This comment edited down to this line as I've decided it's noise. Click Edited beside my name if you want to see what it was.)

JJ commented 2 years ago

2. What do we recommend users as an alternative??

hyperoperators and/or reduce will probably work nicely

vrurg commented 2 years ago

I'm not pretending to fully understand all the problems with junctions and ACCEPTS. My preferred way of using junctions is to have them on RHS as it mostly makes things more readable. This is especially true since the moment I realized that smartmatching is not commutative.

Yet, at least one place I see things done wrong and some of us trying to make conclusions based on that. @raiph refers to rakudo/rakudo#2676 and otherwise related issues. So, I started there. First, I wanted to see things in action. So, I done this:

class C {
    method ACCEPTS(\topic) {
        say "ACCEPTS(", topic, ")";
        (1..9).ACCEPTS(topic);
    }
}
say (0,3).one ~~ C.new;
say (0,3).one ~~ 1..9;

The class wrapper does what is expected despite falling back to the same Range.ACCEPTS. So, I pulled up the source:

    multi method ACCEPTS(Range:D: Mu \topic) {
        (topic cmp $!min) > -(!$!excludes-min)
          and (topic cmp $!max) < +(!$!excludes-max)
    }

Of course we get wrong outcome. With rakudo/rakudo#4618 this should complain, as to my understanding. But at the moment it's not the autothreading which breaks the smartmatch, but rather lack of it.

I was somewhat curious as to what happens if I enforce threading by introducing multi sub infix:<~~>(Junction:D \topic, Mu \matcher) candidate. It doesn't work out of the box due to the compiler stepping in first and optimizing the case. But cutting the compiler off by wrapping the smartmatch into a sub happens to also produce correct behaviour with the Range. So, here is the "fixed" version of smartmatch:

multi sub infix:<~~>(Junction:D \topic, Mu \matcher) {
    (topic.THREAD: { matcher.ACCEPTS: $_ }).Bool
}

And then the test code:

sub match(Mu \topic, Mu \matcher) {
    topic ~~ matcher
}
say so match((0,3).one, 1..9); # True

Tests with any, all, and none produce expected results too.

rakudo/rakudo#2814

I've got @lizmat golf:

say <a>.classify: * eq "a"|"b";  # {False => [a], True => [a]}

What happens here is rather understandable and expected: we autothread over classify. In the current paradigm I see nothing wrong with it, event though it's a WAT case. Then it came to my mind to test a thing I never tried before:

say ("a"|"b" => "c").raku; # any("a", "b") => "c"

So, why can't this be the answer? I mean, wouldn't it look reasonable to get { any(True, False) => ["a"] }?

At this point I must admit the experiments took too much of my time and I can't look into other problematic points. But I rather convinced myself that what is needed is thorough clarification of signatures and use cases. Perhaps and likely there will be individual cases where use of junctions just doesn't make sense. But so far I consider the deprecation as overkill.

BTW, talking about things not making sense:

say so 1 cmp 0; # True
say so 1 cmp 1; # False
say so 1 cmp 2; # True

Really??? ;) Either Order must not boolify whatsoever, or enum values must boolify against the rules for numerics.

raiph commented 2 years ago

@vrurg

Thank you for doing some initial analysis of the issues I listed.

say (0,3).one ~~ C.new;
say (0,3).one ~~ 1..9;

at the moment it's not the autothreading which breaks the smartmatch, but rather lack of it

But, aiui, per (@)Larry (including jnthn), it needs to not autothread. So this would remain broken, which would be "fine".

say <a>.classify: * eq "a"|"b";  # {False => [a], True => [a]}

so far I consider the deprecation as overkill.

By "the deprecation" do you mean the use of junctions with classify as discussed in 2814 or the deprecation that is implicit in what (@)Larry (including jnthn) say needs to occur for things to work sanely and properly performantly (and which I think liz and myself and presumably at least some others already tentatively anticipate is going to happen at 6.e)?

vrurg commented 2 years ago

By "the deprecation" do you mean

The general deprecation of junctions on smartmatch LHS, first of all.

classify is a different case. BTW, it currently goes against more or less common rule of the core that a Junction in an argument would either be collapsed or the return value would be junctional.

After sleeping over with it, I'm now even in doubt of rakudo/rakudo#4618 making real sense. It would effectively disable elementary operations of "a" cmp "a"|"b" and "a"|"b" cmp "a" which are nothing at all special or confusing! With this respect I'd rather consider moon-chilled comment as an option, especially that I came to this very same idea independently this morning. :) Though it makes it hard to manage cases of several junctional elements as they must result in nested junctions. But then again, if it is feasible to implement then a user must be warned about performance implications of such arrays. Otherwise we'd stay in line with "junction in, junction out" rule.

vrurg commented 2 years ago

But, aiui, per (@)Larry (including jnthn), it needs to not autothread. So this would remain broken, which would be "fine".

BTW, rakudo/rakudo#2676 is another example of where decision is taken not because of some kind of conceptual problem with junctions on LHS, but rather due to a technical issue in the core implementation. I went same path as @lizmat did (without knowing that's been done previously) and hit about the same issue with wrong dispatching.

raiph commented 2 years ago

Thanks for following up @vrurg,

so far I consider [The general deprecation of junctions on smartmatch LHS] as overkill.

It would help me if you addressed Larry's specific points.

You should read his two comments in full if you haven't already done so by clicking the link I provided at the start of this issue, but I'll include excerpts from them below, starting with his first comment:

In general, the left side of ~~ should not be auto-threading, or we revert to the suboptimal Perl semantics of smartmatching.

The reason we broke the symmetry of smartmatch is so that we can optimize based on the type of the right-hand argument, and allowing auto-threading of the other argument prevents that.

"when 3" is supposed to be optimizable to a jump table, but how do you calculate a jump from a junction?

So we biased smartmatching in the direction of coercion-first semantics rather early on (though, alas, not early enough to prevent Perl adoption of the earlier design).

In any case, the documentation says ~~ is defined underlyingly by .ACCEPTS, and for a Numeric pattern, that says: multi method ACCEPTS(Numeric:D: $other) "Returns True if $other can be coerced to Numeric and is numerically equal to the invocant (or both evaluate to NaN)".

And I'd be awfully surprised if any(4,3) can be coerced to Numeric in any meaningful way...

Afaict his fundamental issue is optimizing Raku's switch statement. So afaict you are arguing that it doesn't matter to you if Raku's switch statement is slow. Please tell me I'm misunderstanding!

classify is a different case.

Agreed. I thank you for your analysis, and hope it will continue, but given that you've concluded it's not related to this one, please write any further commentary about it in its own issue, not this one. TIA.

in doubt of rakudo/rakudo#4618 making real sense.

Again, I thank you for your analysis and ask that further discussion of it is from here on kept in its own issue.

BTW, rakudo/rakudo#2676 is another example of where decision is taken not because of some kind of conceptual problem with junctions on LHS

Larry's main point is about optimizability/optimization of Raku's switch statement, a critically important aspect of Raku's overall performance, not any conceptual problem.

That said, he also points out that the very simple scheme he described, while primarily justified today by the big and necessary performance improvement it brings, will also eliminate in one fell swoop lots of bugs with things as they are now in Rakudo when junctions are used as the topic of a smartmatch.

And he also points out there's a nice clear different way to express the same thing. Excerpting from his second comment in the email thread:

sugar for .ACCEPTS ... should not introduce special cases either. If you want to make use of a junction like that, you must write
when 3 ~~ $_ {...}
or
when 3 == $_ {...}
or so. This is enough to tell the optimizer not to make a jump table. (Though conceivably spesh could still do that for cases where $_ is provably an integer, I guess.)

vrurg commented 2 years ago

So, we draw no difference between "switch" statement and smartmatch? Pardon me, I'm not going this route. Taking functionality away from one construct simply because this is performance-costly for another one? After all, the subject of this issue is about the smartmatch op itself. From this point of view – see my comments above.

Speaking of given/when optimizations, there are two approaches I can think of right away. The first is restrictive: prohibit junctions as arguments of given and, perhaps, take other measures to limit what is supplied to when.

The second approach is akin to deopts: use a switch table when possible, fall back to slow path otherwise. Given that we can deduct the type of topic (sub foo(Int:D $n) { given $n {...} }, for example) it is realistic not to produce the fallback on some occasions. Moreover, the fallback is only unavoidable when the topic type is determined as Mu, which is far less probable, than Any.

If junctions remain allowed as topics for given/when then performance costs should be specifically documented.

But the basic point I stand at remains the same: junctions are to be handled wherever possible and only prohibited when no other options remain. Like in the case with sort, prohibiting them at cmp level is bad as there is nothing wrong with cmp itself. It is sort which doesn't know how to handle junctions, thus it is its job to throw whenever such element is encountered.

With regard to classify and cmp, they're coming from links published above. And one way or another, in my eyes they're good examples of bad judgments based upon questionable implementations.

raiph commented 2 years ago

So, we draw no difference between "switch" statement and smartmatch? Pardon me, I'm not going this route.

Me neither. The "switch" statement (which in Raku I take to mean use of when) is not the same as smartmatch. It's just one of several constructs that sometimes use smartmatch. Hopefully we're on the same page about that.

Taking functionality away from one construct simply because this is performance-costly for another one?

If you thought that's what I was saying, then no, I wasn't meaning that. It's not what I thought Larry was saying either.

As Larry explained in the quote I provided, no functionality was being taken away, just how one "spells" things -- one has to write when foo ~~ junction rather than when foo.

What is being taken away is a slew of known bugs, and likely many not known.

And doing that in a manner consistent with the consensus reached about optimizability by devs over a decade ago.

Btw, I just checked the where case and can confirm the deprecation / solution applies / works there too:

my @foo where *.all ~~ Int = ... # to be deprecated
my @foo where Int ~~ *.all = ... # check values are `Int`

After all, the subject of this issue is about the smartmatch op itself.

The title was me quoting Larry. (That's why the title was itself in quotemarks.)

If by "subject" you mean the substance of this issue -- what it's about, it's about what Larry wrote in the thread I linked, and the consequences of taking on board what he explained.

Speaking of given/when optimizations, there are two approaches I can think of right away. The first is restrictive: prohibit junctions as arguments of given

Aiui the problems (which aren't just performance) are general to .ACCEPTS with a junction topic. given is just one of many topicalizers but we need a resolution that will work for all of them.

The second approach is akin to deopts: use a switch table when possible, fall back to slow path otherwise.

If that's a decent approach, and solves the performance issues, then that's great, but I'll leave it to jnthn or someone else to comment. :)

the basic point I stand at remains the same: junctions are to be handled wherever possible and only prohibited when no other options remain.

The functionality is never prohibited. It's just folk being "lazy" if you will, by writing where *.all ~~ Int, rather than how it ought be, namely where Int ~~ *.all. Or when foo with a junction as the topic rather than when foo ~~ $_.

With regard to classify and cmp, they're coming from links published above.

Sure. I wrote them in the hope someone would take a look to see if they were related to this issue.

You took a look at some of them, for which I thank you, and, aiui, you were saying none of the ones you looked at were relevant to this issue, and I agreed with that, so am asking that we don't bog this issue down with further discussion of them here.

vrurg commented 2 years ago

Me neither. The "switch" statement (which in Raku I take to mean use of when) is not the same as smartmatch. It's just one of several constructs that sometimes use smartmatch. Hopefully we're on the same page about that.

From this paragraph – we're, and yet later you slide back to problems with when and... that's it.

Taking functionality away from one construct simply because this is performance-costly for another one?

If you thought that's what I was saying, then no, I wasn't meaning that. It's not what I thought Larry was saying either.

Still, it winds down to that. Junctions on LHS of smartmatch is a functionality. This does not always work as expected and has to be fixed. But we're about to disable this arguing that it causes problems to when! Except for a few statements about ACCEPTS to which I'll get back later.

Ok, once again: why not disable junctions for when exclusively? Neither I agree with this too, but let's assume it's acceptable approach.

Moreover, properly disabling junctions for when we can produce better and more explanatory error messages, whereas broken smartmatch would sometimes result in misleading and confusing ones.

Again, I'm not even convinced that semantic problems about when are not solvable. Optimizational issues – perhaps. But it feels to me as new-disp would make some difference here, speaking of the above mentioned fallback approach.

no functionality was being taken away, just how one "spells" things -- one has to write when foo ~~ junction rather than when foo.

Wrong. By disabling LHS use of junctions we will enforce devs into taking special measures to make sure they always get them on RHS (remember, we're not only on when side here!). Even though the number of such cases could be low, but the case proves that we would take something away. In particular, we talk about a bit of freedom of expression.

It is also wrong in a different way too:

class C2 {...}
class C1 {
    multi method ACCEPTS(C2 --> True) { }
}
class C2 {
    multi method ACCEPTS(C1 --> False) { }
}

say C2 ~~ any(C1);
say any(C1) ~~ C2;

Or even easier:

class C1 { }
class C2 is C1 { }

say C2 ~~ any(C1);
say any(C1) ~~ C2;

And what if my code doesn't care for what reason C2 doesn't accept C1? The code must not even care about the content of LHS junction. But forcing me to only have it on RHS would require additional boilerplate code to have the task done.

BTW, this is where optimization is likely to be doomed because there is no proper solution without autothreading over a helper routine which would do smartmatches in the order I need them.

In the meantime I see some suggestions about LHS junctions being sent on RHS, which are based on commutative cases!

As Larry explained in the quote I provided

Let me disagree with Larry. In particular, he states:

In any case, the documentation says ~~ is defined underlyingly by .ACCEPTS, and for a Numeric pattern, that says: multi method ACCEPTS(Numeric:D: $other) "Returns True if $other can be coerced to Numeric and is numerically equal to the invocant (or both evaluate to NaN)".

And I'd be awfully surprised if any(4,3) can be coerced to Numeric in any meaningful way...

He puts ACCEPTS into a special domain, putting it aside from other methods. But it is isn't. Since this works for a decent routine:

sub foo($v) { 2*$v }; 
say foo( all("1", "2") ); # all(2, 4)

I'd expect 3.ACCEPTS(all(3, 1)) to result in all(True, False). Should it be ACCEPTS who collapses the junction? Perhaps, not. Let's leave it to the smartmatch itself. Look, there is Mu.ACCEPTS candidate, which, apparently, doesn't collapse:

    multi method ACCEPTS(Mu:U \SELF: Junction:D \topic) is default {
        topic.THREAD: { SELF.ACCEPTS: $_ }
    }

So, either manual handling of junctions by a particular class, or ensuring that authreading works on them, could fix the problem. Working authreading would even make any("1", "3") work, as the following "workaround" demonstrates:

sub NUM(Numeric \matcher, Any:D \topic) {
    matcher.ACCEPTS(topic)
}
say NUM(3, any("1", "3")); # any(False, True)

BTW, the above part is also the answer to 3.ACCEPTS(any(3,4)) case Larry mentions in message from the first comment link.

What is being taken away is a slew of known bugs, and likely many not known.

Let's fix the bugs in first place, perhaps? Luckily, doctors don't cut the whole hand unconditionally anymore upon encountering a tumor...

Allow me a little digress with relation to the following quote:

Btw, I just checked the where case and can confirm the deprecation / solution applies / works there too:
my @foo where *.all ~~ Int = ... # to be deprecated
my @foo where Int ~~ *.all = ... # check values are `Int`

I have some g... Oh, sorry no good news here! :)

my @a = 1,2,"a"; say ({.all ~~ Int}(@a)); # False
my @a = 1,2,"aa"; say ({Int ~~ .all}(@a)); # True

Or even more funny variant:

my @a = 1,2,"aa"; say ((Int ~~ *.all)(@a)); # all(3)

So, whateverables and smartmatch RHS are to be used with care as one may get not what's expected:

say ((Int ~~ *.all).WHAT); # (Junction)

Anyway, what is proposed here we call "to pour a baby out with dirty water", i.e. to throw away something useful together with something bad or useless. I don't like this semantical mixture of when with smartmatch as an operator and the concept behind the when itself.

codesections commented 2 years ago

I'm pretty torn on this issue. On hand, the argument that Larry made on the mailing list (and that @raiph expanded on above) made some good points about the potential pitfalls of autothreading a junction's left hand side. And I'm of course reluctant to disagree with Larry – something that, despite how rarely he shares his views these days, I'm at risk of making a habit of.

But I also thing @vrurg made a very good point when he said that Larry's discussion "puts ACCEPTS into a special domain, putting it aside from other methods" when it should be treated just like any other method. Or, as someone else put it on IRC:

just define Any.ACCEPTS differently. There is no "it", other than the object on the right, and the object on the right gets to decide how to match.

That quote comes from the implementer of the ACCEPTS semantics that Larry suggests removing – who also happens to be Larry Wall. (Roast commit, Rakudo commit, IRC discussion). Now, I'm perfectly aware that Larry can change his mind (in fact, I seem to recall a rule of some sort to that effect…). But I also think that the logic behind the currently spec'ed behavior still holds up – that is the behavior that people will expect.

Perhaps more importantly, it also seems like the behavior that people can reason about. I know that we'll be able to unlock significant performance gains when we have full autothreading support for junctions – but that only helps if the semantics of junctions is predictable enough for people to actually use them correctly. Otherwise, they'll be a source of confusing concurrency rather than simple concurrency.

One other, slightly unrelated response. @raiph said that the issue comes up due to "folk being 'lazy' if you will, by writing where *.all ~~ Int, rather than how it ought be, namely where Int ~~ *.all". But one of Raku's killer features (imo), is the flexibility it gives us to control word order to best express our thoughts – we can say $return-value.&is: $what-we-expected if that's clearer (or go the other way, and call a method like a function with :. So having to write where Int ~~ *.all instead of where *.all ~~ Int is a cost. (Especially since the latter pretty much reads as "where all of the values are integers", which is the exact thought being expressed). I understand the reasons for breaking ~~'s communicativeity, but it's still a real tradeoff and a loss of functionality (again, imo).

This whole discussion is reminding me of another conversation @raiph and I had a while ago on reddit. @raiph, in that comment, you said:

Junctions are, imo, an unfinished and unpolished feature. If we are collectively undisciplined about how we deploy them before they become a more finished and polished feature, we are risking a future in which Raku begins to gain significant adoption because it gets sufficiently polished in general, but we shoot ourselves in the foot due to providing detractors with potent WTF ammunition.

More and more, I'm starting to come around to that point of view. There's clearly tremendous power here, but there also seem to be a number of fundamental questions that we haven't really resolved yet. As @lizmat suggested in one of the linked issues, it may be that this area needs a bit of an overhaul at a deeper level, even if not a tremendous amount changes with regard to the surface syntax/how junctions are used in workaday code.

codesections commented 2 years ago

After a bit more pondering, I've had two additional thought. But since one favors allowing junctions on the LHS of smartmatches and the other favors deprecating the behavior, I'm left about as torn as I was.

In favor of deprecation

A few times we've discussed the popular idiom of where @users.all ~~ Admin or similar, which would be invalid if we deprecate LHS junctions. I'd been imagining that the fix for that statement would be to write it as where Admin ~~ @users.all, which (imo) sacrifices a good deal of clarity. But there's an obvious alternative that had slipped my mind:

where @users.all R~~ Admin

That very slightly eases the pain of the deprecation – but I'm wondering if we could take it a step further. Specifically, could we add support for a prefix R in contexts where a term is being matched against the topic? That would support the following syntax:

where R Admin {…}
# instead of
where Admin ~~ $_ {…}

If we do that, it seems like it'd ease the pain/conceptual overhead of the deprecation quite a bit in the most common cases.

In favor of keeping the currently spec'd behavior

However, consideration cuts the other way. It seems like it'd come up less often, but to be more serious when it does. Specifically, I'm starting to think that this is not just a matter of spelling, contra @raiph's arguments above. It seems to be in many cases:

# instead of
3 | 4 ~~ 3;
# we'd write
3 ~~ 3 | 4;

But that example uses operands that effectively make ~~ commutative – and as @vrurg already mentioned upthread, ~~ is not guaranteed to be commutative (a frequent source of confusion for intermediate Rakoons). But what happens when it isn't?

# potentially to be deprecated
42 & 'foo' ~~ Cool       # OUTPUT: «True»
# can't be replaced with
Cool ~~ 42 & 'foo'       # OUTPUT: «False»

So it seems that this is not just a matter of spelling – it's something that actually will reduce the power of the ~~ operator.

(Slight aside: thinking through all this has left me very confused by one of the examples @raiph posted upthread.

# As raiph said, this *does*, in fact, check that all values are `Int`
my @foo where Int ~~ *.all = [1, 2, 3];
# But it seems like it shouldn't work?
say Int ~~ [1, 2, 3].all  # OUTPUT: «False»

So now I'm wondering if the (helpful!) behavior of the where clause described above is a bug, or if it's a subtlety that's still escaping me.

jubilatious1 commented 2 years ago

Slight aside: thinking through all this has left me very confused by one of the examples @raiph posted upthread.

I said to myself, "Oh, maybe we restrict Types to the RHS of a smart-match", to clear thing up a bit.

Maybe not (see where clause):

$ raku
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2021.06.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2021.06.

To exit type 'exit' or '^D'
> say Int ~~ [1, 2, 3].all
False
> say Rat ~~ [1, 2, 3].all
False
> say IntStr ~~ [1, 2, 3].all
False
> say [1, 2, 3].all ~~ Int
True
> my @foo where Int ~~ *.all = [1, 2, 3];
[1 2 3]
> my @bar where Int ~~ *.all=[1, 2, 3];
[1 2 3]
> my @baz where *.all=[1, 2, 3] ~~ Int;
Type check failed in assignment to @baz; expected <anon> but got Bool (Bool::False)
  in block <unit> at <unknown file> line 1

> my @baz where [1, 2, 3].all ~~ Int;
[]
>

vrurg commented 2 years ago

After a bit more pondering, I've had two additional thought. But since one favors allowing junctions on the LHS of smartmatches and the other favors deprecating the behavior, I'm left about as torn as I was.

Come to the dark side, we have cookies! :D I'm going to help you on this way. :)

where @users.all R~~ Admin

Look promising, but it isn't. Here is the implementation of META_REVERSE:

sub METAOP_REVERSE(\op) is implementation-detail {
    -> |args { op.(|args.reverse) }
}

So, what effectively happens here is your expression becomes where Admin ~~ @users.all. This will not work, apparently, for any @user element which would happen to be an Admin subclass.

To make R work as expected under the condition that LHS junctions are disabled, it would have to manually (or semi-automatically) thread the junction, reverse the op on each individual eigenstate, wrap it back into a junction. At this point we should give a sorrow look at the "improve when optimization" reason behind LHS prohibition...

Not to mention that the any(...) R~~ topic syntax is weird.

# As raiph said, this *does*, in fact, check that all values are `Int`
my @foo where Int ~~ *.all = [1, 2, 3];
# But it seems like it shouldn't work?
say Int ~~ [1, 2, 3].all  # OUTPUT: «False»
So now I'm wondering if the (helpful!) behavior of the where clause described above is a bug, or if it's a subtlety that's still escaping me.

It's an often overlooked fact, which is actually stated first in the documentation:

The smartmatch operator aliases the left-hand side to $_

For example:

42 ~~ *.say'; #42

So, basically you can get rid of the smartmatch and get the same result with:

my @foo where Int = [1,2,"a"]; # Throws
my @foo where Int = [1,2,3]; # Works

codesections commented 2 years ago

@vrurg, thanks that's all very helpful. I've been on the fence enough that I certainly wouldn't call your position the "dark side" (not that I'd turn down a cookie, of course).

It's an often overlooked fact, which is actually stated first in the documentation:

The smartmatch operator aliases the left-hand side to $_

Oh wow, I'd seen that sentence before, but never grasped what it means until you just showed me! I'd always read that as saying that the LHS got the value of $_ (when there isn't a LHS, such as with when), not that $_ gets the value of the LHS. That's very good to know! (I didn't realize we had any non-block-delimited constructs that change the value of $_ (not counting things like postfix with that stand in for a block construct even though they don't use {…}. Are there any others?)

vrurg commented 2 years ago

Are there any others?

andthen, notandthen, orelse. Guess, that's all.

CIAvash commented 2 years ago

Sorry for going off-topic.

@jubilatious1, @vrurg for using where clause in assignment, variable and signature binding, take a look at Raku/doc#2368

my @a where .raku.say := Array[Int].new: 1,2,3
=output Int␤

my @a where .raku.say = Array[Int].new: 1,2,3
=output 1␤2␤3␤

sub f (@a where .raku.say) {}; f Array[Int].new: 1,2,3
=output Array[Int].new(1, 2, 3)␤

From the IRC link mentioned in the aforementioned doc issue:

Zoffix TimToady: FWIW there's expert opinion wanted. RE: why @foo where … applies to the entire array in params, but to individual elements in variables: https://github.com/rakudo/rakudo/issues/1414#issuecomment-358021893 TimToady someone probably just implemented the one as an anonymous subset, so it distributes like a subset type TimToady we should probably make them consistent Zoffix Noted. TimToady otoh, binding is for the whole array, and most varaiables are constructed piecemeal by individual assignments that each have to be checked, so the situations aren't equivalent

As for the behavior of ~~, I am torn as well, what is the replacement for .all ~~ Int?

say Int ~~ (1,2,3)».&{.WHAT}.all
=output True␤

? But this returns true as well:

say 5 ~~ (1,2,3)».&{.WHAT}.all
=output True␤

Does it make sense?

Is the ACCEPTS method of Junction wrong?

Maybe:

say so all (1,2,3) »~~» Int
=output True␤

I think @vrurg and @codesections make some good points.

jubilatious1 commented 2 years ago

Hmmm.

https://perldoc.perl.org/perlsyn#Differences-from-Raku

vrurg commented 2 years ago

Sorry for going off-topic.

So, this is better be discussed somewhere else. Reddit, SO?

what is the replacement for .all ~~ Int?

There are planned replacements. And they might even be some performance win on falsy cases. The problem is that they only cover type matches, leaving one with no options for other cases where RHS uses custom ACCEPTS.

But this returns true as well:
say 5 ~~ (1,2,3)».&{.WHAT}.all
=output True␤

Sure it does. (1,2,3)».&{.WHAT} gives you ((Int) (Int) (Int)). So, 5 is all Int in your case.

say so all (1,2,3) »~~» Int doesn't make sense because all is not a list. If LHS junction eigenstates are lists, then the metaop should be individually applied to each one and resulting lists are wrapped back into the junction type.

CIAvash commented 2 years ago

say so all (1,2,3) »~~» Int doesn't make sense because all is not a list. If LHS junction eigenstates are lists, then the metaop should be individually applied to each one and resulting lists are wrapped back into the junction type.

The problem is not all, but the hyperoperators going deep. So these work:

say so all [1, 2, 3] Z~~ Int xx ∞;
=output True␤

say so all [[1,2,3], <a b c>] Z~~ List xx ∞;
=output True␤

say [&&] [1, 2, 3] Z~~ Int xx ∞;
=output True␤

say [&&] [[1,2,3], <a b c>] Z~~ List xx ∞;
=output True␤

say [1,2,3].map(* ~~ Int).all.so
=output True␤

say [[1,2,3], <a b c>].map(* ~~ List).all.so
=output True␤

raiph commented 2 years ago

@codesections

That quote comes from the implementer of the ACCEPTS semantics that Larry suggests removing – who also happens to be Larry Wall.

I am now convinced that it is not Larry who was ever suggesting removing those semantics. Instead it is, or rather was, me, due to a train of sloppy thought on my part, not Larry's.

Indeed, I imagine Larry thinks precisely as he did when he designed smart matching long ago, and also when he introduced the change in 2015 that you've unearthed, which unearthing (thank you!!!) is what has led to my comment here.

To explain myself, let me repeat most of my opening comment, and comment on what I think I got right, and what I got wrong:

Please read Larry's two comments

That was good. Though I will now add that folk still interested in this issue perhaps ought also read and understand the original design specification of Smartmatching, or at least the opening three paragraphs, before commenting further.

taking on board the ramifications he outlines for both optimization and correctness/craziness

That was also good. There are things to resolve here if Rakudo is not following the original design per the opening three paragraphs in the design section I just linked. Or, if it is, then perhaps there are doc issues.

if we let the LHS of ~~ autothread.

That is also to the point, but disastrously sloppy.

What Larry wrote was:

In general, the left side of ~~ should not be auto-threading

He did not write:

the left side of ~~ should not be auto-threading

And then my sloppiness went into overdrive:

And then consider this extremely popular (wrong!?!) idiom:
where .all ~~ Foo

Given that Larry had specifically changed things in 2015 to enable that idiom (where *a type object* on the RHS of `~~` *does* autothread), my using a capitalized `Foo` was a master stroke in deception, one I have accidentally unleashed on this thread.

And then I said something that would have been a good comment if I hadn't done the `Foo` thing:

> (which I recently discovered doesn't work for a lot of cases I hoped it would work for.)

The salient point left here is that, in general, in current Rakudo, it generally doesn't work if `Foo` is *not* a type object, and that's OK, and my hopes were unreasonable because I was ignoring how it's supposed to work.

----

So, in the hope this clears up this `Foo` aspect of this issue, I want to be crystal clear that I do *not* think Larry was changing his mind in his mailing list comments this year.

Instead I think he was reinforcing that code like *this* should *not* auto-thread:

any(1,2,3) ~~ 3


Because, aiui:

A) It's code like *that* that destroys optimizability, for the reasons he outlined on the mailing list.

B) It is *much* easier to understand (at least it is for me) how making `any(1,2,3) ~~ 3` a run-time error, with a message telling a user they can write `3 ~~ any(1,2,3)` instead, is not only in the spirit of the design as it has always been, but is a perfectly reasonable design, all things considered.

C) We need to document this well enough that folk aren't confused.

> I also think that the logic behind the currently spec'ed behavior still holds up – that is the behavior that people will expect.

Indeed.

----

I'm hoping I've not driven folk to fixed positions. Please stay open minded. Please start over fresh in what you think Larry is saying. Focus on *his* words as authoritative, not mine.

CIAvash commented 2 years ago

[OFF-TOPIC] OK, I'm going off-topic, again! In the link @raiph mentioned, the design document talks about binding and pattern matching, I don't know if I'd forgotten that or hadn't read it before. Read this section. cc @codesections

vrurg commented 2 years ago

B) It is much easier to understand (at least it is for me) how making any(1,2,3) ~~ 3 a run-time error, with a message telling a user they can write 3 ~~ any(1,2,3) instead, is not only in the spirit of the design as it has always been, but is a perfectly reasonable design, all things considered.

After this paragraph any further discussion loses any sense. ~~ is not commutative. Period. Even though the fact that this particular case is seemingly is, I have provided an example where it isn't. And not only "is a type?" kind of ops should be considered here, but any user-implemented ACCEPTS method, for which we're not able to make any assumptions about.

Another thing which has crossed my mind recently is: how did it happen that the statement "junction must on autothread on LHS" eventually turned into "junction on LHS must be disabled"? I'm not currently getting deep into this, but giving ACCEPTS a candidate to handle junctions effectively turns autothreading off for the method. It's not our business to intervene and prohibit the method to implement the same kind of behavior, we would currently get by autothreading over it. So, does it mean we have to guard over its return value afterwards and block any junction returned? BTW, the smartmatch op itself doesn't autothread already as it takes Mu.

lizmat commented 2 years ago

It is sort which doesn't know how to handle junctions, thus it is its job to throw whenever such element is encountered.

Checking every element for Junctions would not help in the performance of .sort :-(

raiph commented 2 years ago

@codesections

Please see my latest update of the comment I wrote that opened this issue.

Larry's discussion "puts ACCEPTS into a special domain, putting it aside from other methods"

That makes it sound like it's a problem that it's in a special domain, and that Larry only just changed it to be that way. But consider the original design verbiage:

The first section [of the smartmatch table] contains privileged syntax; if a match can be done via one of those entries, it will be. These special syntaxes are dispatched by their form rather than their type. Otherwise the rest of the table is used, and the match will be dispatched according to the normal method dispatch rules. The optimizer is allowed to assume that no additional match operators are defined after compile time, so if the pattern types are evident at compile time, the jump table can be optimized. However, the syntax of this part of the table is still somewhat privileged, insofar as the ~~ operator is one of the few operators in Perl that does not use multiple dispatch. Instead, type-based smart matches singly dispatch to an underlying method belonging to the X pattern object.

Each of the bold bits is special. That's a whole lot of ACCEPTS (very deliberately and emphatically) being in a special domain! :)

Continuing the design verbiage:

In other words, smart matches are dispatched first on the basis of the pattern's form or type (the X below), and then that pattern itself decides whether and how to pay attention to the type of the topic ($_).

The bold bit explicitly emphasizes that the smart matching invocant is expected to decide how to treat its topic. Indeed that seems to be the very heart of what the word smart in "smart matching" is about.

All in all, there's no doubt in my mind that ACCEPTS has always deliberately been in a very "special domain", and part of that is that individual invocant types should decide what they'll do with their topic.

(Of course, that doesn't mean it should be anything other than sensible. I'm just sayin'.)

This issue isn't just about smart matching. It's about using it with another feature, namely junctions.

And what can we say about the junctions design? Well, junctions were very deliberately designed so that any individual parameter of any individual routine, built in or user defined, could choose which of its arguments would auto-thread if they're junctions.

Again, things still need to be sensible. But again, I'm just sayin'. For example, say doesn't auto-thread its arguments. :)

Putting aside how things have thus far been designed/implemented, smart matching is a core part of Rakudo, and so are junctions. So how should they work together?

The late great John Shutt's PhD focused on perhaps the most regular serious PL in the world (Kernel). About a decade later, shortly before his recent untimely passing, he wrote Irregularity in language,one of his last blog posts. In it he wrote:

Sapient thought structures are too volatile to fit neatly into a single rigid format; large parts of a language, relatively far from its semantic core, may be tolerably regular, but the closer things get to its semantic core, the more often they call for variant structure. It may even be advantageous for elements near the core to be just slightly out of tune with each other, so they create (to use another physics metaphor) a complex interference pattern that can be exploited to slip sapient-semantic notions through the formal structure.

To be clear, human languages show this characteristic. I was listening to a radio show recently pointing out that kids have developed incredible nuance around saying OK. Apparently all of these express different tones and semantics, and kids increasingly recognize these worldwide: ok, Ok, OK, ok., okay!, k, kk, and so on. As I heard this I realized that even me, an old fogey, had been unconsciously taking on some of these distinctions.

I think this corresponds to what Larry said about designing the paths on a new campus at UCLA in the 1970s. The approach was striking. They came in and built all the buildings. But they didn't create any paths between the buildings. Faculty and students had to traipse through mud or whatever to get from A to B. A year later the construction folk came back and generally built paths where students and faculty had ended up walking, albeit with a few changes. This turned out to be a fantastic approach to design, one that was highly successful and efficient in all kinds of dimensions.

In this case, the design of smart matching and of junctions and other basic features are like the buildings. Then we're figuring out where people want to go, and then generally just accepting whatever junctions appear, though in some instances it depends on the types involved.

[Larry vs Larry]

I'm not saying Larry has or has not changed his mind, but if you've read my edit of my opening comment it's hopefully now clear to you that what he wrote this year is consistent with the change he made in 2015 which in turn is consistent with the original design.

I also think that the logic behind the currently spec'ed behavior still holds up – that is the behavior that people will expect.

Yes, if the spec'd behaviour is that, in general, the left side of ~~ should not be auto-threading.

Perhaps more importantly ... semantics of junctions [needs to be] predictable enough for people to actually use them correctly.

Yes.

At the level we're talking here there's only one aspect of them that's relevant; do they auto-thread or not? Aiui, per Larry's view, if a type does not define its own ACCEPTS method, then the default situation is:

If the ACCEPTS invocant is a type object of type Any, or a subtype, it auto-threads its topic. (So code like where .all ~~ Bar will work, but will plausibly never be optimizable at compile-time.)
Otherwise it doesn't. (And likely can be optimized at compile-time.)

I don't currently see that as hard to understand or reason about, even if, as Larry notes, it might perhaps warrant a Perl-to-Raku trap note because Perl introduced its own "Smart matching" that works differently.

"folk being 'lazy' if you will, by writing where *.all ~~ Int, rather than how it ought be, namely where Int ~~ *.all".

What I was trying to convey was what Larry pointed out in the email thread. That is to say, instead of writing when foo or when Bar, one can write when Bar ~~ foo or when foo ~~ Bar. Yes, it may read less well, but one can do it.

having to write where Int ~~ *.all instead of where *.all ~~ Int is a cost. (Especially since the latter pretty much reads as "where all of the values are integers", which is the exact thought being expressed).

Fwiw I think the reading "Int accepts all" is about as reasonable:

where Int ~~ *.all

But I get that it would be a shocking change. That's one of the main reasons why I opened this as a problem solving issue. (And have been tempted to close it, given that it's looking like this change will not happen.)

I understand the reasons for breaking ~~'s communicativity

Hang on. Do you, er, accept that it not being commutative is a fundamental part of the design, one that we can't realistically revisit even if we wanted to? Not due to optimizability, but it already being deeply baked in?

@raiph, in that comment, you said: "Junctions are, imo, an unfinished and unpolished feature. ..."

More and more, I'm starting to come around to that point of view.

Yes. I still think it needs an overhaul. But I currently think that'll be 6.f at the earliest unless a capable champion takes it on immediately, and likely even then.

# instead of
3 | 4 ~~ 3;
# we'd write
3 ~~ 3 | 4;

Following Larry's guidance in his emails I think one would realistically use set operations for that instead of smart matching with junctions.

# potentially to be deprecated
42 & 'foo' ~~ Cool       # OUTPUT: «True»
# can't be replaced with
Cool ~~ 42 & 'foo'       # OUTPUT: «False»

I don't think Larry was suggesting the switch in this way. I think that was my sloppy thinking/writing that got us into this branch of the rabbit hole.

I think he was saying that, in response to being told that this kind of result occurs in some recent Rakudos:

say do given 42 | 'foo' { when 42 { `True` } } # OUTPUT: «False»

and that it's a surprising result for some folk, he said that they could/should instead write:

say do given 42 | 'foo' { when 42 ~~ $_ { True } } # OUTPUT: «True»

And, further, he was saying this should not work, even though, at least in some recent Rakudos, it does:

say do given 42 | 'foo' { when $_ ~~ 42 { True } } # OUTPUT: «True»

So it seems that this is not just a matter of spelling – it's something that actually will reduce the power of the ~~ operator.

I don't think so. You can write it either way around, and you will get the result that makes sense for that way around. What's at issue is what sense makes sense one way, and what sense makes sense the other way, and Larry is arguing that smart matching has been designed to be asymmetric for a decade plus, and this is helpful (due to optimizability and some other factors), and sufficiently reasonable, and that it's not that difficult to deal with, provided Rakudo doesn't have related bugs.

(I think I confused us all by focusing on abstractly talking about reversing foo ~~ bar, due to sloppy thinking/writing on my part, when what I was really trying to bring attention to was Larry's explanation of what I hope I've better shown above. It's about going from when foo to when $_ ~~ foo or, if that's not what's wanted, to when foo ~~ $_ (or when $_ === foo or whatever).)

# As raiph said, this *does*, in fact, check that all values are `Int`
my @foo where Int ~~ *.all = [1, 2, 3];
# But it seems like it shouldn't work?
say Int ~~ [1, 2, 3].all  # OUTPUT: «False»

Yeah, maybe it shouldn't and that's a bug. As far as confusion goes, I guess it's tag, you're it? :)

raiph commented 2 years ago

@vrurg

After [B] any further discussion loses any sense.

Hopefully what came before and after B made sense. I agree B is nonsense.

~~ is not commutative. Period.

Yes. I know that. Scrap B.

What I was trying to convey was what Larry wrote in his second email:

If you want to make use of a junction like that, you must write

    when 3 ~~ $_ {...}

or
when 3 == $_ {...}
or so. This is enough to tell the optimizer not to make a jump table. (Though conceivably spesh could still do that for cases where $_ is provably an integer, I guess.)

not only "is a type?" kind of ops should be considered here, but any user-implemented ACCEPTS method, for which we're not able to make any assumptions about.

Yes. In general, non-type invocant calls of ACCEPTS should not auto-thread, but if a user defines an ACCEPTS method then it should of course get used.

how did it happen that the statement "junction must on autothread on LHS" eventually turned into "junction on LHS must be disabled"?

I'm not following.

First, as I tried to emphasize at the start, the key to making sense of this issue was and remains to focus on what Larry wrote. In his emails, in the design documents, and on IRC.

Second, he did not write that junctions on the LHS should not auto-thread. He wrote that they should not in general. As codesections has pointed out, Larry himself created the junction ~~ Type idiom in which the junction on the left does auto-thread, so it's reasonable to conclude he wasn't saying that junctions on the LHS should not auto-thread.

Third, who has said they should be disabled? I was trying to suggest that it might make sense if, in general, using a junction as the topic of an ACCEPTS, when the invocant is an instance of a type that clearly isn't going to meaningfully match or not match a junction, results in an error message saying it's probably not going to do what the user means, and perhaps they mean to do such-and-such?

I'm not currently getting deep into this, but giving ACCEPTS a candidate to handle junctions effectively turns autothreading off for the method.

It's not our business to intervene and prohibit the method to implement the same kind of behavior, we would currently get by autothreading over it.

I never intended this issue to be about prohibition. I wanted and still want it to be about paying attention to:

"In general, the left side of ~~ should not be auto-threading".
Bugs in smart match / junction combinations in Rakudo.
Inconsistencies in smart match / junction combinations in Rakudo.
Surprises in smart match / junction combinations in Rakudo.
What was evident in the email thread, and what Larry wrote in his replies, and in the original design docs for smart matching.

So, does it mean we have to guard over its return value afterwards and block any junction returned?

I'm lost. The original design of smart matching in S03 seems clean and clear to me, and we just need to get back to that as the foundation for fixing existing bugs, and enabling the optimizations the original design envisaged. That is all. I really don't think the spiralling complexity you're getting into is relevant to solving the issues I was hoping might get sorted out, starting with understanding what Larry wrote in the email thread.

vrurg commented 2 years ago

I've got a little bit of spare time on my hands to reply here... @lizmat first. :)

It is sort which doesn't know how to handle junctions, thus it is its job to throw whenever such element is encountered.

Checking every element for Junctions would not help in the performance of .sort :-(

Things are not that bad, after all. A new candidate on cmp to block junctions would, in fact, result in the same kind of check. It could be slightly better optimized, perhaps. But it would still be there.

vrurg commented 2 years ago

@raiph, what I must agree with you for certain, is that it all gets too complex. Though when it comes to my side, this is mostly due to me trying to reply to some arguments.

Anyway, I hope to simplify things a bit. Unfortunately, the lack of time doesn't allow me to resolve few confusions my statements caused earlier. But two things I have to make clear, though in much shorter (and thus prone to mistakes and new confusions) way, then before I have accidentally closed the tab with this page a couple of minutes ago, effectively destroying my previous reply... (facepalm)

First, however valuable source of information synopses and Larry comments are, I don't take them for granted and apply some amount of critical thinking to them. The synopses are now considered as historical documents for a good reason: the language has passed a long way since they were written. We, as community, now have much more experience.

Considering Larry's opinion, stated in his message under the link in your first comment, I totally disagree that 3.ACCEPTS(any(3,4)) must be False. Let's take another method, which wouldn't implement special handling for junctions (i.e. it doesn't have a signature with Mu or Junction typed parameter), but otherwise is doing the same, as the ACCEPTS in question. What we'd get from supplying it with the same any? any(True, False), apparently, which collapses into True. So, what is proposed by Larry would only cause even more confusion. There is another aspect in this, but I should get back to it later. As well as to the fact why Larry is right about the smartmatch in general.

Each of the bold bits is special. That's a whole lot of ACCEPTS (very deliberately and emphatically) being in a special domain! :)

Basically, I can't get around this because you apply what was written about smartmatch as an operator to ACCEPTS method. This is not right, in my view.

Now, I'd better skip a few other statements, and get straight to the point, where you might agree with me. :) In the meantime just one little stop: be careful when you propose. Because it wasn't only me, who considered your original approach as the way to prohibition of junctions on ~~ LHS. rakudo/rakudo#4620 proves it. :)

`~~`

As a matter of fact, ~~ doesn't autothread already. Consider its signature: multi sub infix:<~~>(Mu \topic, Mu \matcher).

So, what we have here is nothing needs to be done to conform the original design and Larry's later interpretations.

The only thing remains problematic is the optimization. But as I don't consider disabling of junctions the right way, out of the box the only right approach in sight is possible when RakuAST comes. With it it should be possible to infer the type of object on LHS and take decisions based on it. For any non-Mu or non-Junction typed scalar, positional, or hash the compiler would be able to produce a fast-path bytecode. This is where the statement about the optimizer possibly ignoring any later defined ~~ infix would take place. Whatever.

Same would apply to when optimization, when reliable inferring of topic type could help producing table-based dispatch when possible.

`ACCEPTS`

This is our primary source of confusion for now. Which could be resolved with some ease, actually. For this we should consider the body of ~~ operator:

    matcher.ACCEPTS(topic).Bool;

Simple, as it is. What does it tell us? ACCEPT is not obliged to return a boolean! Something boolify-able (and sorry if I wrote this incorrectly) – yes, sure. Is a juction boolify-able? Certainly – yes.

Thus we come to the point, where one of my previous statements steps in: sometimes we must allow the compiler to implement autothreading over ACCEPTS to get the right result! In 3.ACCEPTS(any(3,4)) example 3 | 4 ~~ 3 will work as expected as soon as there is no ACCEPTS(Mu) or ACCEPTS(Junction) candidate, which might mess things up unintentionally!

Another such example is Range.ACCEPTS, which I mentioned above. It also a source of confusion:

    multi method ACCEPTS(Range:D: Mu \topic) {
        (topic cmp $!min) > -(!$!excludes-min)
          and (topic cmp $!max) < +(!$!excludes-max)
    }

The problem here is that it ignores the fact that topic can be a junction and uses it as a discrete value. As soon as we replace Mu with Any – the problem would be gone! Unfortunately, there are other problems with in-core implementation of ACCEPTS for this problem to be fixed so easy.

BTW, if I'm not mistaken, the method would be working correctly would Order values boolify correctly. Unfortunately, as long as so(1 cmp 1) is False, as long the method will be producing incorrect outcome. But, unfortunately, we already have code which relies on this weird behavior.

But what I'm trying to say here is that, except for a couple of minor discrepancies, there is no contradiction between the current implementation of ~~/ACCEPTS in Rakduo to the original design and Larry's view. All problems we have are in the area of implementation bugs which affect user experience. In particular, some rather considerable amount of redesign for ACCEPTS multi-candidates is needed to resolve them.

And at this point I feel somewhat stupid because the amount of time I've spent on commenting here would, perhaps, be sufficient to have at least half of the job done. On the other hand, it was really good food for my brains. :)

vrurg commented 2 years ago

Interesting note. I've made another approach to reworking ACCEPTS in the core and the results are promising. But there is an unexpected conclusion: many things would be in much better shape if we'd autothread over Mu-typed parameters, leaving only Junctions alone.

lizmat commented 2 years ago

I was under the impression that the whole point of Mu was, that it would not autothread.

vrurg commented 2 years ago

It was. Basically, it's not the point, but Mu accepting anything is the one. And yet, too many times I see Mu used solely for that purpose, without noticing that "anything" actually includes junctions. The above mentioned Range.ACCEPTS is one example of such method. When it was created the side effects of junctions were not considered whatsoever. There were a couple of other methods where I also noticed, that passing junctions into them could result in undesired outcomes, though it'd be too hard to recall now what exact methods were those.

Another point for threading over Mu is multidispatch complexity caused by the need to handle junctions explicitly to simulate desired behavior when one does need Mu and does want to thread in usual manner. In this case two candidates for both Junction and Mu are needed, and the former one would have to explicitly do topic.THREAD: {...} simulate default compiler behavior. In a way, this makes non-compiler code to rely on implementation details. Would we autothread over Mu then all this would work other way around: one would need a Junction candidate only when the default compiler behavior is undesirable.

These arguments are opposed by two hard to solve problems.

First, possible performance impact. Where now we bypass any type checks whenever a target is Mu-typed, some additional processing would be required to deal with junctions. This impact could, possibly, be somewhat leveled down by new dispatchers, but at the cost of additional type guards, if I get it all right.

Second, problems with code which does expect junctions to be captured by Mu-typed targets. These are certainly solvable in the core, but would be harder to catch in the wild, except for published modules.

A few word on how the current state of things affects ACCEPTS. I was slightly overoptimistic when wrote that the results are promising. At that point I though I managed to get Range done properly without major overhaul of the ACCEPTS candidates. I was wrong. So far, the only feasible solution seems to be to give individual ACCEPTS prototypes to at least some of the core classes. This also means duplicating such crucial candidates, as type checking (ACCEPTS(::?CLASS:U: ...)), and actual junction handling in cases, where ACCEPTS over Mu is needed. The latter is the case where explicit topic.THREAD: {...} is needed, same as the one we currently have for Mu. While in general this might even be good for the performance as the overall number of candidates per class is getting lower, but memory footprint and ease of code refactoring will get worse.

BTW, this would be a little bit off topic, but still related. It looks like many of confusions of multi-dispatch could be solved if methods are prioritized based on MRO. So that the case of, say, :(Any:D: Any) and (:Child:D: Mu) would be resolved in the favor of Child candidate, instead of erroring over ambiguity. Unless I oversee a hidden trap, this would be helpful in many cases.

So, the idea looks controversial even to myself. I like it in theory, but practical implications are pulling back. If it ever considered seriously, then I'd rather expect it to be a 6.e feature implemented with RakuAST as this would allow to do much better call site analysis and early discarding of autothreading code when it is not needed.

jnthn commented 2 years ago

First, possible performance impact. Where now we bypass any type checks whenever a target is Mu-typed, some additional processing would be required to deal with junctions. This impact could, possibly, be somewhat leveled down by new dispatchers, but at the cost of additional type guards, if I get it all right.

It'd be worse than that. Type guards are on exact types. A Mu-typed parameter implies that we don't need to place any guard at all (in single dispatch, at least; in a multiple dispatch it needs every parameter in that position to have a Mu type constraint, or at least within the nodes that are tied in the topological sort of the candidates).

In our various aggregate types (Array, Hash, and friends), the type of the value parameter in ASSIGN-POS and friends is Mu. While ASSIGN-POS sees a vast number of type tuples in any significant application (every type ever stored into an Array), and so would be vulnerable to megamorphic blow-up, in reality that doesn't happen because the Array invocant and the index (an Int) are very stable, and there's no guards on the Mu. Break that, and while microbenchmarks will likely look alright, real world applications will suffer.

This isn't theoretical; I originally did have multiple dispatch putting guards on Mu-typed parameters, and fixed it in 9ad99eb93f3f after I found it causing full inline caches when investigating why a large application ran more slowly after new-disp (which this and a few other such megamorphic blow-up issues fixed, it now runs faster than before new-disp).

jnthn commented 2 years ago

It looks like many of confusions of multi-dispatch could be solved if methods are prioritized based on MRO.

This was considered and rejected as early as A12.

Multiple dispatch is based on the notion that methods often mediate the relationships of multiple objects of diverse types, and therefore the first object in the argument list should not be privileged over other objects in the argument list when it comes to selecting which method to run. In this view, methods aren't subservient to a particular class, but are independent agents. A set of independent-minded, identically named methods use the class hierarchy to do pattern matching on the argument list and decide among themselves which method can best handle the given set of arguments.

(Granted a lot has changed since then, but this was revisited and upheld at other times too.)

vrurg commented 2 years ago

It'd be worse than that. Type guards are on exact types.

Funny enough, this comment comes at about the time I basically finished playing with feasibility of "autothreading over Mu" approach. Aside of the performance considerations, another big problem arises which I don't see a good solution for: whenever there are multiple parameters in a signature which could accept junctions then there is no easy solution to handle all possible combinations of junction and non-junction arguments.

I was also considering introducing a container-like class to be used as a transport for basically any kind of object to pass "fragile" or "hostile" things around. Say, call it Parcel and make semi-transparent, i.e. under some circumstances it could silently repack into a scalar. But at the first glance the idea looks like an introduction of extra complexity and, likely, extra performance loses.

So, this was an interesting experiment, but I'm done with it. Though if anybody is interested in having closer look – here is the link: https://github.com/vrurg/rakudo/tree/autothread-over-Mu.

Would need to have some tasks sorted out, and then will try to sort out ACCEPTS candidates of problematic cases.

vrurg commented 2 years ago

rakudo/rakudo#4653 and raku/roast#774 are resolutions for this discussion.

vrurg commented 2 years ago

rakudo/rakudo#4653 & rakudo/rakudo#4698 resolve this issue. I think it can be closed now.

raiph commented 2 years ago

It's great to see all those resolutions of issues.

I'm surprised they resolve this issue.

The core of this issue was Larry's view, which jnthn has said he was inclined to agree with, that "In general, the left side of ~~ should not be auto-threading", and addressing that. You had earlier said you point blank disagree with Larry and jnthn, and I hadn't caught a shift in that split of opinions. Are you saying that all key parties are now aligned? Is the new view consistent with Larry's?

vrurg commented 2 years ago

@raiph I'm tired of this discussion. ~~ doesn't autothread. It works with junctions on LHS, as well as when statements do. It does it the way no spectests are broken and some of the above mentioned issues are resolved. Moreover, I took care of static optimizations of some common place cases. I also fixed some bugs introduced by earlier optimizations.

After all, I provided links to PRs where things are explained in many details and commits are well commented.

I'm not getting back into all this again.