cplusplus / sender-receiver

Issues list for P2300
Apache License 2.0
19 stars 3 forks source link

Should the default implementation of `bulk` support cancellation? #278

Open lewissbaker opened 2 months ago

lewissbaker commented 2 months ago

In P2300R10, the default implementation of bulk is specified to just call f N times in a loop.

However, this loop does not attempt to inspect any stop-token to watch for a stop-request. This means that once it starts executing there is no way to get it to stop.

If this is a loop of N=1'000'000 iterations that will run for 30s+ then there could be quite a large latency waiting for the operation to complete in the case that the result is no longer required and a stop-request has been issued.

Should we be requiring that the default bulk operation periodically checks for a stop-request if the receiver connected to the bulk operation has a stop-token with stop_possible() equal to true?

Customizations need not be required to check for a stop-request, but we should at least consider specifying the default implementation to check for a stop-request and early-exit the for-loop.

inbal2l commented 2 months ago

default bulk operation periodically checks for a stop-request

Why not on every iteration? Seems more intuitive to me. BTW, are there any other mechanism by which we can do this besides "spin"?

lewissbaker commented 2 months ago

Why not on every iteration?

It depends on how much work each iteration of the bulk is doing.

If it's doing operations that are a few instructions, then the synchronisation needed to ask if there is a pending stop request (usually an atomic-acquire-load) might be quite an overhead to add to each iteration.

It may be worthwhile to unroll some number of iterations of the loop and only check for a stop-request after each batch. This might allow more optimizations like vectorization of the inner loop.

It's not clear, however, if the batching should be performed by the user or by the implementation of bulk, though. e.g. the user could always pass 'count / 8' as the shape and then in function perform '8' iterations.

GorNishanov commented 1 month ago

TLDR:

  1. yes, default implementation of bulk need to support cancellation
  2. no. we should not specify that cancellation needs to be checked on every iteration

Discussion:

I think we should specify periodically, as opposed to on every iteration.

I expect a default high-quality implementation of bulk to use adaptive partitioning and thus it will be running multiple work items processing a batch of items in a tight vectorizable loop.

bulk(N, f) 

   --> will result in concurrent executions of

void execute_chunk(unsigned start, unsigned end, const F& f)
   #pragma loop(ivdep)
   for(; start < end; ++start)
       f(start);
}

Scheduler that does chunking will be responsible for chunking and cancellation. The execute_chunk should be focusing on running the best possible loop with no interruptions. (We want the compiler to turn this loop into SIMD instructions as appropriate)

Requiring the check on every iteration will be detrimental.

The implementation I am experimenting right now, adapts to chunk running for less than, say 1ms, balancing a good compute performance with making sure they are not hogging threadpool and letting other work to run (and allowing for cancellation checks as well)

inbal2l commented 1 month ago

After discussion, the action that got the most consensus: Reword [exec.bulk] so that to leave it implementation-defined on how and whether “bulk” is not executing (cancellable), when getting a “stop token”. Do this for both the default “bulk” implementation (by modifying [exec.bulk] p4 and the customizable bulk (for which limitations are described at [exec.bulk] p6).