Closed lizmat closed 4 years ago
With https://github.com/rakudo/rakudo/pull/2903, I think <==
and ==>
are brought in line with the spec.
The question is how are <<==
and ==>>
supposed to work? Should code like this be allowed to run?
[4,5,6] ==>> [1,2,3] ==>> my @foo;
Or should only one appending feed operator be allowed at a time?
my @foo;
@foo <<== [1,2,3];
@foo <<== [4,5,6];
If more than one should be allowed, should they be allowed in combination with their respective assigning operators, like this?
my @even <== grep { $_ %% 2 } <== 1..^100;
@even <<== grep { $_ %% 2 } <== 100...*;
Also, from the parallelization pullreq:
There's a problem with this... this benches slower than the current implementation of feed operators, even when there's blocking I/O going on at the same time. I think more discussion needs to be made about whether or not this should be implemented.
Feed operators were benching much faster in the first pullreq I made. Should we ignore the spec about parallelizing feed operators?
FWIW, I don't think feeds need to create containers, so we can have that performance benefit. It's only the storing in the endpoint that should create containers if the receiving wants that (e.g Array
vs List
).
Disregard what I said about ignoring the spec, I figured out how to get parallelized feed operators to run 5x faster than the current implementation
Before I can continue with my pullreq, there's something that needs to be resolved. Modules in the ecosystem are using feed operators with things that aren't iterable. Here's an example from CUID:
sub timestamp {
(now.round(0.01) * 100)
==> to-base36()
==> adjust-by8()
==> padding-by8()
}
Should this behaviour be preserved?
Does that currently return an array or a scalar?
A scalar
Then I think a nqp::p6store
will take care of that eventuality.
Before I can continue with my pullreq, there's something that needs to be resolved. Modules in the ecosystem are using feed operators with things that aren't iterable.
My feeling is that any function you feed a value into had better be happy with getting its input as a final extra Iterable
argument (presumably a Seq
with an underlying iterator that is pulling from a Channel
). Or, once we support it, such an argument at insertion point.
If we've things in the ecosystem that don't play well with that - which I don't believe the example given here will - we may need to preserve the existing semantics for 6.d and below, and introduce the new ones for 6.e.PREVIEW
and onwards.
The feed operators really didn't get that much attention to date. The implementation before the recent work was very much a case of "first draft", and certainly didn't explore the parallel aspects alluded to in the language design docs. I'd be surprised if we can make them behave usefully going forward without breaking some of the (less thought out, and probably accidental) past behaviors.
Also, some notes on the parallelism model with feed operators: it's quite different from the hyper
/race
approach.
In the hyper
/race
case, we take the data, divide it up into batches, and work on it. Where possible, for the sake of locality, we try to push a single batch through many operations, e.g. if you do @source.race.map(&foo).grep(&bar).map(&baz)
then we'd send a batch, do the map
s/grep
in the worker, and send back the resulting values. In this model, the parallelism comes from dividing the input data. The back-pressure here is provided by the final consumer of the pipeline.
By contrast, the feed model is about a set of steps that execute in parallel. The parallelism is in the stages of the pipeline being run in parallel, not from the data items. It can be seen as a simple case of a Staged Event-Driven Architecture. Since a given state is single-threaded, it may be stateful - whereas if you try and do stateful things in a map
block in a hyper
/race
it's going to be a disaster. The backpressure model here would ideally be that once a queue becomes full, you cannot put anything more into it. One possible solution here would be to make Channel
take an optional bound. Then a send into a Channel
that is considered full would block, so you can't put more in, meaning a fast stage can't overwhelm a slow one.
One slightly more general problem is that Channel
today doesn't really fit our overall concurrency model very well: it blocks a real OS thread when we try and receive from it, whereas in reality we like non-blocking await
ing of things where possible. I mention that here mostly because I think the stages in a pipeline should be spawned on the thread pool scheduler, but it's quite clear that they won't be the best behaved schedulees with Channel
as it exists today. Probably we should solve that at the level of Channel
, though, so I'd just use Channel
between the stages today. It means we get error and completion conveyance, which are easy to get wrong, so I'd rather not have more implementations of those. :-)
Some problems will be better parallelized with hyper
/race
, some with feed pipelines, but there's also the issue that some things aren't even worth bothering. I fear the ==>
operator is especially vulnerable to that: while I don't think too many folks will write .hyper
because it looks prettier, they probably will write ==>
for that reason. If we magically speed up their programs with parallelism that's great, but there's a decent chance it won't be worth it, and in fact slow things down. That's a tricky problem, and it's also one we'll have to solve for the hyper
/race
model too. For now, I'd say just do the parallel implementation, and we'll investigate such heuristics and automatic decision making later. I don't think usage of ==>
is widespread enough yet for us to really upset anything
The parallelization part of this is done, all that's left is support for <<==
, ==>>
, and *
. I have a question regarding how <<==
and ==>>
should work though:
my @foo = (1, 2, 3);
(4, 5, 6) ==>> @foo ==>> my @bar;
say @bar; # OUTPUT: (1, 2, 3, 4, 5, 6)
What should the value of @foo
be after running this? (1, 2, 3, 4, 5, 6)
or (4, 5, 6)
? I think (4, 5, 6)
DWIMs better, but I'm not entirely sure.
See https://github.com/rakudo/rakudo/issues/2899 for the start of this discussion.