{#design-fpg} Questions about forward_progress_guarantee and get_forward_progress_guarantee()

ericniebler commented 9 months ago

Issue by lewissbaker Wednesday Dec 29, 2021 at 11:02 GMT Originally opened as https://github.com/NVIDIA/stdexec/issues/349

Does the get_forward_progress_guarantee() CPO only apply to the schedule() operation? Do we intend it to also apply to all future operations we may add to the scheduler? e.g. execution contexts on which a schedule_after() or a file-read-operation on a file associated with an io_scheduler?

And does the guarantee need to be with respect to all other threads in the system? Or only with respect to other tasks scheduled with that same scheduler?

There also doesn't seem to be a good definition of what constitutes an "execution agent created by that scheduler". In the case of the run_loop context, is the thread that calls run() "created by that scheduler"? Is each task enqueued to the run_loop context a separate execution agent?

What would be an example of a scheduler that provides the concurrent guarantee? A thread-pool that shares a thread between tasks enqueued to it would only provide parallel. A new_thread_scheduler that always spawned a std::thread may still fail to spawn the thread in which case the task is not executed. It can still only guarantee that the task will run with concurrent forward progress once it starts and so presumably also only provides parallel forward progress?

Is a schedule() operation that completes with an error still required to provide the guarantee on that context that calls set_error()? Does the get_forward_progress_guarantee() result only relate to the context that schedule() operations complete with set_value() on?

ericniebler commented 9 months ago

Comment by LeeHowes Wednesday Dec 29, 2021 at 18:42 GMT

Do you think we should support different types of forward progress on the same scheduler? That might be too fine-grained and confusing, though we could certainly imagine some sort of schedule(fp_guarantee) operation.

The guarantee would be consistent with the other guarantees we make. I don't think it's relative to anything as such. I have long argued that we should try to be more fine-grained about this, though. Maybe this is the right time to try to do that.

Concurrent is fine as is. If agents are concurrent then they will start irrespective of what else is running.
Parallel is interesting. Once started they will run concurrently. Will they start? What in the system can block them? If it is only agents on the same execution context, then that could be interesting if you have very tight control of the context. If it's agents on the same scheduler, or the same algorithm launch then that is interesting too. Probably more interesting though is a bulk launch mode that launches n concurrent agents at some point (ie it is bulk parallel forward progress) and then stops launching.
Weaker forward progress becomes especially interesting. Can we pair two weakly parallel agents on the same SIMD operation if they are launched through entirely different schedulers? What set of agents might be be able to group together, and thus be mutually interfering?

Maybe all of that becomes too fine-grained to reasonably deal with, though, and everything is relative to something more abstract.

Is each task enqueued to the run_loop context a separate execution agent?

It has to be that. Every instance of schedule() (eventually, if started etc) creates a new agent.

It can still only guarantee that the task will run with concurrent forward progress once it starts and so presumably also only provides parallel forward progress?

I believe Michael and I discussed this a few weeks back in the meeting. A new thread scheduler would be concurrent in the sense that even concurrent agents can have some limit. Optimally here we would say it is concurrent, or it fails. Presumably then a truly concurrent scheduler would launch a thread and have some way to detect if that thread fails to launch, signalling an error under those conditions.

ericniebler commented 9 months ago

Comment by lewissbaker Thursday Dec 30, 2021 at 23:52 GMT

One issue with supporting something like concurrent forward-progress is that I might spawn two tasks, both of which call arrive_and_wait() on a barrier initialised with a count of 2. If the first task starts and then the second task fails to start, e.g. because of resource exhaustion, then my algorithm will not complete as it requires that the second task also starts so that it can unblock the first task.

I think concurrent forward progress makes sense within narrow scopes where we can know how many execution agents we are going to need to create up-front (e.g. when spawning a bulk operation with a given iteration shape) but I worry that it's not going to be implementable in general for schedule() where an application could spawn an arbitrary number of tasks.

I suspect that most use-cases where an algorithm requires concurrent forward progress it will have some kind of group of tasks that will block on each other (e.g. because they all wait on the same std::barrier) and it will require that either all of these tasks start or that none of these tasks start.

I'm not sure that an algorithm that requires concurrent forward progress could do so reliably on top of a fallible-schedule() operation by spawning one task at a time unless we add some kind of cancellation suppoprt to std::barrier::wait() and other similar facilities (e.g. std::counting_semaphore::acquire(), std::latch::wait(), std::atomic::wait() or any other blocking synchronisation operation).

However, maybe having the schedule() operation either guarantee concurrent forward progress or fail promptly then we can build such an atomic task-group facility on top of that. e.g. by having all of the spawned tasks immediately block on a barrier and wait until all tasks are launched successfully before starting to do the work?

This seems like somewhat of a weaker guarantee than general "concurrent forward progress" though. Separate task-groups would not be able to rely on the other task group starting and so the concurrency guarantees wouldn't necessarily compose into higher-level operations.

ericniebler commented 9 months ago

Comment by LeeHowes Friday Dec 31, 2021 at 01:11 GMT

Maybe the question is what it means to fail in practice. We can rely on infalliable new in many realistic circumstances. Is it reasonable to say that we rely on in falliable concurrency unless you hit extreme limits, and at that point we terminate?

ericniebler commented 9 months ago

Comment by LeeHowes Friday Dec 31, 2021 at 01:14 GMT

Or we limit its interpretation to bulk, such that bulk has to guarantee all or nothing if the scheduler requests concurrent? That still makes it hard to create fork-join parallelism but maybe not impossible.

lewissbaker commented 1 month ago

I still think this is an issue that we need to resolve in the wording of[exec.get.forward.progress.guarantee] of P2300R10.

The current wording talks about forward progress guarantee of "execution agents created by the scheduler's associated execution resource".

And in [async.ops] p10 we talk about schedulers being factories for senders whose operations execute value completion operations on such an execution agent.

However, we don't say that starting a schedule() operation creates a new execution agent and so we don't really have any words to talk about the forward progress guarantees of individual schedule operations that schedule work onto a given execution resource.

I think this means that, at best, we can only meaningfully provide parallel forward progress guarantees for schedule operations, even if the underlying execution agents executing those tasks can themselves provide concurrent forward progress.

To provide stronger guarantees we'd need to talk specifically about the relative forward progress guarantees of various operations scheduled on a given execution context. This could potentially be done by defining starting a 'schedule' operation as creating a new execution agent.

But then we'd also need to consider forward progress guarantees of other operations like bulk() - both with respect to individual executions of the function, but also with respect to other bulk() or other schedule() operations scheduled on the same scheduler.

There is also a whole area of forward-progress guarantees around cancellation that we need to consider. e.g. is an operation guaranteed to complete synchronously in response to a stop-request or is its completion dependent on the forward progress of other tasks currently executing on the associated execution resources?

LeeHowes commented 3 weeks ago

Yes, there's some complexity here.

We need an idea of the forward progress guarantee of operations. I don't know if we need bulk, or any other algorithm customization, to be different from schedule. Operation-level guarantees can be based on schedule, and be defined by the guarantee that once start() is called, set_value() will be called (I say that in case we need nuance for stop and failure modes). That is that however large the fan-out of the operation is, the overall operation likely obeys parallel FP.

Then we have the question of what the minimum safe FP guarantee is for calling schedule. So schedule may guarantee parallel FP when called from a thread, but if start() is called from a weaker agent when is it safe to call at all, and is its guarantee downgraded? bulk() can't guarantee parallel FP if started from a weakly parallel agent if it runs on that same agent.

Then there is the FP guarantee of agents created by the operation. bulk's individual work items may have a weaker guarantee of course, and that may or may not depend on the agent that started them. If bulk is serialized, clearly it depends on the starting agent. If bulk is dispatched to a thread pool or GPU, then its internal guarnatee depends on that separate executor but its completion may have a weaker guarantee.

I'm not sure I have a clear picture of how all of this interacts in practice.

cplusplus / sender-receiver

{#design-fpg} Questions about forward_progress_guarantee and get_forward_progress_guarantee() #106