Open bill-poole opened 1 year ago
Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.
Author: | bill-poole |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Collections` |
Milestone: | - |
Tagging subscribers to this area: @dotnet/area-system-threading-tasks See info in area-owners.md if you want to be subscribed.
Author: | bill-poole |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Threading.Tasks`, `untriaged` |
Milestone: | - |
This is reasonable in theory; whereas the ParallelOptions.TaskScheduler can be used to coordinate how many tasks are allowed to execute across any number of consumers at a time, the proposed ParallelOptions.ConcurrencyLimiter would enable doing so even when there was no work to actually run.
That said, it could add non-trivial overhead to the processing (which might otherwise be very fine-grained). It would also add complexity in understanding the mind-bending relationship between what limits the scheduler might put in place, what limits the concurrency limiter might put in place, and MaxDegreeOfParallelism. And as you've outlined, someone can achieve the majority of the benefits just by using the concurrency limiter in the body of the loop; you mentioned a downside of large memory consumption, but worst case is you're holding onto MaxDegreeOfParallelism items from the source, and if that's too large for the scenario, lower the MaxDegreeOfParallelism. Finally, there's also a practical issue, which is that Parallel is in the netcoreapp shared framework and ConcurrencyLimiter is not; taking this dependency would require pulling it in.
I'm not opposed to keeping this issue open to consider it again in the future, but I wouldn't rush to try to do something here now.
Thanks @stephentoub for considering this proposal.
if that's too large for the scenario, lower the MaxDegreeOfParallelism
Lowering the MaxDegreeOfParallelism
isn't really a practical option in my case. In my case, the system needs a certain degree of parallelism to meet its performance requirements.
For example, let's say the system writes records to a database, each write has a latency of ~10 ms, and the system needs to write 100,000 records per second. Therefore, the required degree of parallelism is 1,000. If we lower the max degree of parallelism, then system won't meet its performance requirements.
Let's also say that the system performs many different types of operations that each writes a batch of an arbitrary number of records to the database. Furthermore, these operations are also performed concurrently. If Parallel.ForEachAsync
is used to write each batch of records to the database, then there is no MaxDegreeOfParallelism
parameter that we can use that will:
That then leaves us with the option of rate limiting inside the body
provided to Parallel.ForEachAsync
, but that can cause the system to run out of memory as we previously discussed.
I have however after some additional consideration found a third option, which is to rate limit the source enumerable given to Parallel.ForEachAsync
:
public static async Task ForEachAsync<TSource>(IAsyncEnumerable<TSource> source,
Func<TSource, CancellationToken, ValueTask> body, RateLimiter rateLimiter,
CancellationToken cancellationToken)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(body);
ArgumentNullException.ThrowIfNull(rateLimiter);
await Parallel.ForEachAsync(RateLimited(source, rateLimiter, cancellationToken),
new ParallelOptions
{
MaxDegreeOfParallelism = int.MaxValue,
CancellationToken = cancellationToken
}, async (element, cancellationToken) =>
{
try
{
await body(element.Source, cancellationToken);
}
finally
{
element.Lease.Dispose();
}
});
static async IAsyncEnumerable<(TSource Source, RateLimitLease Lease)> RateLimited(
IAsyncEnumerable<TSource> source, RateLimiter rateLimiter,
[EnumeratorCancellation] CancellationToken cancellationToken)
{
await foreach (var element in source.WithCancellation(cancellationToken))
{
// Wait until there is sufficient capacity and then yield the next element to
// be processed. This ensures we don't consume any additional elements from the
// given source enumerable until there is sufficient capacity to process the
// current element. i.e., this transmits backpressure from the rate limiter to
// the source enumerable.
var lease = await rateLimiter.AcquireAsync(permitCount: 1, cancellationToken);
yield return (element, lease);
}
}
}
However, this relies on:
RateLimited
enumerable guaranteeing that the lease
once acquired is always yielded to the consumer of the enumerable (i.e., this cannot be intercepted by an exception or cancellation); andParallel.ForEachAsync
method guaranteeing that once its source
enumerable yields an element (i.e., once MoveNextAsync
has returned true), the body
is invoked with the yielded element.This is necessary to ensure that every lease acquired from the rate limiter is disposed.
I have taken a look at the source code of the Parallel.ForEachAsync
method and it seems to provide the required guarantee. However, that's an implementation detail that is not guaranteed to hold true in the future. But it seems this is my only option, other than to copy/modify the Parallel.ForEachAsync
implementation.
Parallel is in the netcoreapp shared framework and ConcurrencyLimiter is not; taking this dependency would require pulling it in
Understood. The requirements would also be met by allowing a SemaphoreSlim
to be provided (as opposed to a ConcurrencyLimiter
).
Background and motivation
The maximum degree of parallelism of a
Parallel.ForEachAsync
invocation is constrained by setting theParallelOptions.MaxDegreeOfParallelism
property. However, this constrains the degree of parallelism per invocation; not globally.If a method that invokes
Parallel.ForEachAsync
is itself invoked on multiple threads (e.g., per received HTTP request), then this has a multiplying effect and can overwhelm the server. For example, a web server may use aConcurrencyLimiter
to constrain to handling no more than 100 simultaneous requests, each of which invoking a method that invokesParallel.ForEachAsync
withParallelOptions.MaxDegreeOfParallelism
also set to 100. This results in thebody
function passed to theParallel.ForEachAsync
method being invoked with a maximum effective concurrency of 10,000.Ideally, each invocation of
Parallel.ForEachAsync
could be constrained using a globalConcurrencyLimiter
for that invocation. This could be achieved by adding aConcurrencyLimiter
property to theParallelOptions
class. Thus in the above example, a static/globalConcurrencyLimiter
could be defined in the class that contains the method that invokesParallel.ForEachAsync
and given to theParallelOptions
parameter, providing a global (process-wide) limit to the number of concurrent tasks invoked by that invocation ofParallel.ForEachAsync
across all HTTP requests.Furthermore, other invocations of
Parallel.ForEachAsync
in other places in the code can each set/use their own global limits, each governed by its ownConcurrencyLimiter
instance.Note that while the
body
function passed to theParallel.ForEachAsync
method can use aConcurrencyLimiter
internally to limit the global concurrency of givenbody
function, theParallel.ForEachAsync
method would continue to consume items from the givensource
enumerable parameter up until the permitted degree of parallelism specified by theParallelOptions.MaxDegreeOfParallelism
property. Thus if thesource
enumerable isIAsyncEnumerable<T>
, and the items being enumerated are very large in number and not all in memory (i.e., streamed from network/storage), and theParallel.ForEachAsync
consumes items from the givensource
enumerable faster than the givenbody
function can process them, then this could lead to the server running out of memory.However, if the
Parallel.ForEachAsync
method isConcurrencyLimiter
-aware, then it knows to only consume the next item from the givensource
enumerable when there is sufficient capacity to process it.API Proposal
API Usage
Alternative Designs
No response
Risks
No response