Open patrickbkr opened 1 year ago
In https://github.com/rakudo/rakudo/pull/5158 I tried replacing the continuation based solution strategy with a much simpler approach of just always executing whenever tapping after the supply block itself has run. That behavior is a lot easier to understand.
It does contradict the syntax though. Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed. Only executing them after having processed the supply block is thus highly counterintuitive.
Additionally it makes code like the following impossible to write:
# Simplified from t/spec/S17-promise/nonblocking-await.t
my $started = Promise.new;
my $ok = start react {
whenever IO::Socket::Async.listen($s-address, $port) -> $conn {
whenever $conn.Supply(:bin) -> $buf {
await $conn.write: $buf;
$conn.close;
}
}
$started.keep;
}
In that piece of code a Promise
is used to track the set up of the react
block. When delaying whenever tapping after the processing of the react block, then that $started.keep
happens too early.
I thus consider https://github.com/rakudo/rakudo/pull/5158 a non-solution. At the moment I have run out of ideas of how to solve this.
This is unexpected. The order seems reversed. Why is this so?
A supply
block (and thus react
) provides concurrency control by enforcing one-message-at-a-time semantics: that is, we must completely process a message (reach the end of the whenever
block) before we process the next one. An await
inside of a whenever
block thus prevents execution of any other whenever
block starting until the await
ed event completes. (If one wants more concurrency, the answer is to instead use a nested whenever
; were await
to have different semantics then there'd not be a means to achieve back pressure).
The body of a supply
or react
block itself is much like an OO constructor: just as we expect the constructor to complete before we have a complete object ready to receive method calls, we expect the setup phase of the supply
or react
block to run to completion before it processes any other messages.
Thus, the behavior in that example seems as intended.
Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed.
Yes, it's important for correctness that the subscription is established right away.
The currently implemented solution for this deadlock problem in https://github.com/rakudo/rakudo/commit/26a9c313297a21c11ac30f02349497822686f507 and https://github.com/rakudo/rakudo/commit/547839200a772e26ea164e9d1fd8c9cd4a5c2d9f works as follows. When - in the setup phase, during the processing of a whenever tapping - an await happens (be it a protect, acquire, lock or plain await) a continuation starting at the root of the whenever tapping is taken. Only after the supply block itself finished running the continuation is resumed. So the order in which the code is executed is dependent on whether there is any locking going on in code tapped by the whenever.
This behavior is difficult for a unknowing programmer to follow. Even a programmer knowing about this behavior will potentially have a hard time knowing whether there is a lock somewhere in the tapped code that could cause this reordering of code. This is a case of action-at-a-distance.
The core of the issue really is that supply
and react
are primarily a tool for managing existing concurrency, rather than as a means to introduce concurrency. However, in an attempt to DWIM, we try to permit a supply
to immediately emit
in its body, rather than just subscribe to other underlying sources and emit
based upon events from those. The BlockAddWheneverAwaiter
approach, as noted in https://github.com/rakudo/rakudo/issues/5141, seems to have shortcomings when one mixes supply
blocks with use of a Supplier
, for example.
I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard await
mechanism take over, primarily on the basis that if supply
blocks really were going to have to get into the business of introducing concurrency, then at least they should avoiding introducing threading too. Perhaps, however, the latter would have been the lesser evil; I'm curious if that would solve the problem at hand here and be more robust generally.
Thanks for your input!
This is unexpected. The order seems reversed. Why is this so?
A
supply
block (and thusreact
) provides concurrency control by enforcing one-message-at-a-time semantics: that is, we must completely process a message (reach the end of thewhenever
block) before we process the next one. Anawait
inside of awhenever
block thus prevents execution of any otherwhenever
block starting until theawait
ed event completes. (If one wants more concurrency, the answer is to instead use a nestedwhenever
; wereawait
to have different semantics then there'd not be a means to achieve back pressure).
That question was meant as the heading for the following paragraph. But thanks for explaining anyways! ;-)
Thus, the behavior in that example seems as intended.
Which example are you refering to? The one in the first post does trigger the await magic and manages to change the order in which the supply block and tapping are executed.
Currently whenever blocks can be placed anywhere in a supply block, which strongly suggests that they are processed where they are placed.
Yes, it's important for correctness that the subscription is established right away.
I agree.
The core of the issue really is that
supply
andreact
are primarily a tool for managing existing concurrency, rather than as a means to introduce concurrency. However, in an attempt to DWIM, we try to permit asupply
to immediatelyemit
in its body, rather than just subscribe to other underlying sources andemit
based upon events from those. TheBlockAddWheneverAwaiter
approach, as noted in rakudo/rakudo#5141, seems to have shortcomings when one mixessupply
blocks with use of aSupplier
, for example.I picked the continuation juggling approach to resolve the conflicts over the alternative of scheduling the continuations using the normal thread pool scheduler and letting the standard
await
mechanism take over, primarily on the basis that ifsupply
blocks really were going to have to get into the business of introducing concurrency, then at least they should avoiding introducing threading too. Perhaps, however, the latter would have been the lesser evil
I can try implementing that. So that I get it right: I keep the BAWA (BlockAddWheneverAwaiter), but instead of putting the continuations in a list for later processing, I'd just instantly schedule the continuations on the ThreadPoolScheduler (not on the outer $*AWAITER
, which would be a BlockingAwaiter and thus block the BAWA itself).
I'm curious if that would solve the problem at hand here and be more robust generally.
I guess that depends on whether we want to be pedantic. If the aspiration is to never have surprising order reversals, then queuing things on the thread pool isn't a step forward. I am confident though that having things in the thread pool will solve https://github.com/rakudo/rakudo/issues/5141 (the issue that made me explore the supply / whenever implementation in the first place). In that deadlock situation a continuation earlier in the list blocks on a lock a later continuation holds. But there is no "A waits on B, B waits on A" situation. So if the other continuations would be allowed to run, things would work out.
It still feels wrong to give locks not being involved in the supply setup the same treatment. I think I'll start another experiment: I'll remove the entirety of the BlockAddWheneverAwaiter and see if the queue-on-recursion logic implemented in https://github.com/rakudo/rakudo/commit/547839200a772e26ea164e9d1fd8c9cd4a5c2d9f suffices to prevent deadlocks on construction. I think I have not yet seen a situation where a lock we need to resolve during construction has not been a recursion lock.
Just removing the BlockAddWheneverAwaiter logic basically means having recursing emits during supply construction instantly return and thus killing back-pressure. All tests and spectests pass except for a single test:
tap-ok Supply.zip(
Supply.from-list("a".."e"),
Supply.from-list("f".."k"),
Supply.from-list("l".."p")
),
[<a f l>,<b g m>,<c h n>,<d i o>,<e j p>],
"zipping with 3 supplies works";
The supply is done
before a single value is emitted. That happens, because the first supply gets to emit all its values and done
(each being put into the protect-or-queue-on-recursion
queue) before the others. That queue is then processed in order causing the zip to be done
as soon as all the messages of the first supply have been processed.
Technically it still works as documented (The docs state: "The resulting supply is done as soon as any of the given supplies are done."), but I'm unsure if that behavior is acceptable.
I have now also tried making the BlockAddWheneverAwaiter queue its continuations on the thread pool. That gets make test
pass and make spectest
hang in S17-supply/[flat|batch|unique|merge|zip].t
, but succeed otherwise.
I have now pushed two branches to GitHub:
Also to note, both branches solve the dead-lock problem in https://github.com/rakudo/rakudo/issues/5141
I'll see if I can get the put-continuations-on-thread-pool solution working (i. e. get the hangs to pass).
In general I like the solution removing the BlockAddWheneverAwaiter better. I believe its easier to understand what's going on as only locks of the supplies themself are affected. Also there are no continuations involved.
@jnthn What do you think?
I found the reason for the hangs with the schedule-continuations-on-the-thread-pool approach. The hanging tests replace $*SCHEDULER
with CurrentThreadScheduler
which purposely blocks on calls to cue
. Since we rely on the scheduler to have our continuations executed concurrently to prevent deadlocks the code can't resolve locks with a CurrentThreadScheduler
.
The docs say the CurrentThreadScheduler
is useful mostly for testing. So what's the point of having code run with the CurrentThreadScheduler
? Is it forbidden to rely on the scheduler to resolve locks?
With the schedule-continuations-on-the-thread-pool approach there is also a test failure in t/spec/S17-supply/flat.t
my $s = Supplier.new;
my $f = $s.Supply.flat;
my $seen1 = [];
my $t1 = $f.tap( { $seen1.push: $_ } );
$s.emit([1,2]);
is $seen1, [1,2], 'did we get the first emit (1)';
Cause: Emitting on a .flat
ed supply involves setting up a whenever
and processing values in src/core.c/Supply-factories.pm6:460. That in turn activates the BlockAddWheneverAwaiter
which causes the value to be emitted asynchronously, causing the $s.emit
and is
in the test to be racy.
Perhaps, however, the latter would have been the lesser evil; I'm curious if that would solve the problem at hand here and be more robust generally.
My conclusion from the above analysis is, that the schedule-continuations-on-the-thread-pool approach as I implemented it brings its own set of problems. I don't think these issues can be alleviated.
So the only approach I still have hope in is the remove-the-BlockAddWheneverAwaiter-completely approach I also explored. See this comment for the issue surrounding that approach.
Unless other ideas of approaching this come up, I think the solution space has been exhaustively explored. I'm now dependent on feedback whether we should go forward with one of the two explored approaches.
I'd like to make sure the state of this issue is clear.
jnthn has explained that we have to live with occasional reordering. That's known, intentional and we can't do much about it.
So the only problem left to be solved is the deadlock described in https://github.com/rakudo/rakudo/issues/5141. I explored two approaches trying to solve the deadlock issue:
So to be able to continue working on this issue I'd like to either have:
Supply.zip
spec test and we could do a Blin run to see if there is more fallout.So with the above summary I now ask for feedback. Ping @jnthn, @vrurg, @niner
I'm not much into supplies from their implementation side, so not much help on this. But removal of a non-obvious deadlock cause does sound good to me. What's not good is that the spectest mentioned can barely be changed without breaking backward compatibility. Thus I wonder if the option 1, would it be accepted eventually, can be a 6.e change? In which case the spectest would still define 6.c/d behavior and we'd have it different for 6.e.
Though it may also depend on the degree of change required for the spec to pass.
What's not good is that the spec test mentioned can barely be changed without breaking backward compatibility.
I only partly agree.
tap-ok Supply.zip(
Supply.from-list("a".."e"),
Supply.from-list("f".."k"),
Supply.from-list("l".."p")
),
[<a f l>,<b g m>,<c h n>,<d i o>,<e j p>],
"zipping with 3 supplies works";
The test strongly relies on the supplies being processed round robin, where each supply only gets to emit a single value each round. Supplies never gave any guarantee along those lines. That this works out on the current implementation is because of a delicate implementation detail.
The test looks as if zip
is meant to process incoming values as long as at least one supply is still active. But that's different from the current implementation and the documentation.
So I'm confident to say that there is a mismatch between test and implementation. One of the two must be wrong.
I do agree though that the behavior of my proposed changes is sub optimal in this case (processing all values and the done
from the first supply before working on the second).
An alternative reading is, that the spec really does require a round-robin processing by Supply.zip and the current implementation of this method is just very fragile and relies on current implementation details of other parts of Supply. So as part of the fix, method zip could be adjusted to deal with the new situation.
Maybe by creating a local queue for each of the child supplies and taking one element at a time out of this queue and asking the child supply for more if the queue is empty? This way it wouldn't matter if the child supply emits one or multiple values.
The remove-the-BlockAddWheneverAwaiter-completely approach. This solves the deadlock, but has the effect that recursing emits during whenever setup do not block anymore but are queued. This causes one spec test failure in a test that turned out to https://github.com/Raku/problem-solving/issues/364#issuecomment-1416843630. A positive side-effect of this approach is that locks in general do not cause reordering anymore, only recursion that would otherwise deadlock causes reordering. In my opinion this makes it easier to understand what's going on.
The schedule-continuations-on-the-thread-pool approach. This solves the deadlock as well, but it introduces concurrency. https://github.com/Raku/problem-solving/issues/364#issuecomment-1418285405 made clear that this concurrency is problematic. I think this renders this approach unsuitable.
If I understand option 1 correctly, this is simply saying that it is left to Lock::Async
to do the queueing? In that case I think it's effectively achieving what 2 would, but by far more elegant and well-exercised means (e.g. we don't have to build anything new).
So long as we process one message at a time in a given supply
and react
instance (where the running of the supply
or react
blody block is considered a message too), then it should be fine.
An alternative reading is, that the spec really does require a round-robin processing by Supply.zip and the current implementation of this method is just very fragile and relies on current implementation details of other parts of Supply. So as part of the fix, method zip could be adjusted to deal with the new situation.
That's a reasonable reading in my opinion. I did a draft PR https://github.com/rakudo/rakudo/pull/5200 that explores an alternative way to implement Supply.zip
that passes all current spectests. It would be interesting to try it in combination with the deadlock fix; I think it should help.
If I understand option 1 correctly, this is simply saying that it is left to
Lock::Async
to do the queueing?
Yes, that's exactly what happens.
So long as we process one message at a time in a given
supply
andreact
instance (where the running of thesupply
orreact
blody block is considered a message too), then it should be fine.
From all I know that invariant is not broken. Each supply block is still guarded by a lock and no code path enters that supply block without requesting that lock.
I did a draft PR rakudo/rakudo#5200 that explores an alternative way to implement
Supply.zip
that passes all current spectests. It would be interesting to try it in combination with the deadlock fix; I think it should help.
I have requested a change on that draft PR. With that change make test
and make spectest
both succeeded. I have now created a PR with those changes in rakudo/rakudo#5202.
I now consider https://github.com/rakudo/rakudo/pull/5202 ready for review. That PR passes make test
, make spectest
and the original deadlock in https://github.com/rakudo/rakudo/issues/5141. I have turned that deadlocking code into a roast test in https://github.com/Raku/roast/pull/833.
The PR contains one controversial change in behavior:
In https://github.com/patrickbkr/rakudo/commit/547839200a772e26ea164e9d1fd8c9cd4a5c2d9f the recursion breaking logic and Lock::Async::protect-or-queue-on-recursion
was originally implemented. The commit message states: "A holder's recursion competes fairly with outside messages, thanks to the queueing being through Lock::Async." This is not true anymore, as the queued code objects are now synchronously called and not put on the thread pool anymore.
This issue and the PR are now blocking on feedback / approval. @jnthn I think you have the deepest insight in that area of Raku. Thus I'd like to have your approval before going forward with my changes.
I've merged the proposed fix and respective test in https://github.com/rakudo/rakudo/pull/5202 and https://github.com/Raku/roast/pull/833 We now have a full month time to find any regressions.
Now that the changes have been merged, what do we do with this issue? Do I need to condense all the comments into some prose and do a problem-solving PR?
This is a follow up of https://github.com/rakudo/rakudo/issues/5141 and https://github.com/rakudo/rakudo/pull/5158
Consider the following piece of code:
That piece of code prints
This is unexpected. The order seems reversed. Why is this so?
By default whenevers tap their supply instantly and as a result run the supply block they are connected to. This has a potential for deadlocking when that supply block (or one nested via more
whenevers
) contains anemit
as theemit
will call the whenever block of the original supply, but by definition only a single execution branch is allowed in a supply block at any time. Possibly (I'm not sure about that) there are other scenarios posing similar deadlock issues without recursion being involved.The currently implemented solution for this deadlock problem in https://github.com/rakudo/rakudo/commit/26a9c313297a21c11ac30f02349497822686f507 and https://github.com/rakudo/rakudo/commit/547839200a772e26ea164e9d1fd8c9cd4a5c2d9f works as follows. When - in the setup phase, during the processing of a whenever tapping - an await happens (be it a protect, acquire, lock or plain await) a continuation starting at the root of the whenever tapping is taken. Only after the supply block itself finished running the continuation is resumed. So the order in which the code is executed is dependent on whether there is any locking going on in code tapped by the whenever.
This behavior is difficult for a unknowing programmer to follow. Even a programmer knowing about this behavior will potentially have a hard time knowing whether there is a lock somewhere in the tapped code that could cause this reordering of code. This is a case of action-at-a-distance.
Also there still is a possibility of the code deadlocking left. That's detailed in https://github.com/rakudo/rakudo/issues/5141. I consider that a problem separate from this issue.