Open yar-shukan opened 5 months ago
@ealsur , @Pilchie , a friendly reminder about PR to progress with it. do you have any concern regarding it or can we move on with it? thanks!
Running benchmark (see code below) with bombarder (Run example: .\bombardier-windows-amd64.exe https://localhost:5001/api/diagscenario/withtaskrun/500 -d 90s -c 500
) I came to conclusion this change doesn't worth it: I can't see any big difference our improvement in throughput/latency and I can't see any diff in dotnet counters
regarding ThreadPool Thread Count
or ThreadPool Queue Length
[HttpGet]
[Route("withtaskrun/{taskCount}")]
public async Task<ActionResult<int>> WithTaskRun(int taskCount)
{
_logger.LogInformation($"Started with: #{Interlocked.Increment(ref _counterStarted)}");
var semaphore = new SemaphoreSlim(parallelization, parallelization);
var tasks = new List<Task<Customer>>();
for (int i = 0; i < taskCount; i++)
{
tasks.Add(Task.Run(async () =>
{
await semaphore.WaitAsync().ConfigureAwait(false);
try
{
return await PretendQueryCustomerFromDbAsync(Guid.NewGuid().ToString()).ConfigureAwait(false);
}
finally
{
semaphore.Release();
}
}));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
var completed = Interlocked.Increment(ref _counterCompleted);
_logger.LogInformation($"Completed with: #{completed}");
return completed;
}
[HttpGet]
[Route("withouttaskrun/{taskCount}")]
public async Task<ActionResult<int>> WithoutTaskRun(int taskCount)
{
_logger.LogInformation($"Started without: #{Interlocked.Increment(ref _counterStarted)}");
var semaphore = new SemaphoreSlim(parallelization, parallelization);
var tasks = new List<Task<Customer>>();
for (int i = 0; i < taskCount; i++)
{
tasks.Add(PretendQueryCustomerFromDbAsync(semaphore));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
var completed = Interlocked.Increment(ref _counterCompleted);
_logger.LogInformation($"Completed without: #{completed}");
return completed;
}
async Task<Customer> PretendQueryCustomerFromDbAsync(SemaphoreSlim semaphore)
{
await semaphore.WaitAsync().ConfigureAwait(false);
try
{
return await PretendQueryCustomerFromDbAsync(Guid.NewGuid().ToString()).ConfigureAwait(false);
}
finally
{
semaphore.Release();
}
}
async Task<Customer> PretendQueryCustomerFromDbAsync(string customerId)
{
int c = 0;
var r = new Random();
for(int i = 0; i < 100; i++)
{
c += r.Next(0, 100);
}
await Task.Delay(100).ConfigureAwait(false);
for (int i = 0; i < 100; i++)
{
c += r.Next(0, 100);
}
return new Customer(customerId + c.ToString());
}
Semaphore | Connections | ThreadCount | WithTaskRun: Reqs/sec | WithoutTaskRun: Reqs/sec | WithTaskRun: Latency(ms) | WithoutTaskRun: Latency(ms) |
---|---|---|---|---|---|---|
40 | 125 | 1 | 1228 | 1207 | 108 | 107.94 |
40 | 125 | 3 | 1189.41 | 1192.93 | 108.17 | 108.15 |
40 | 125 | 10 | 1196.73 | 1217.05 | 108.15 | 108.07 |
40 | 125 | 50 | 607.09 | 587.95 | 215.92 | 216.36 |
40 | 125 | 100 | 415.85 | 396.91 | 323.66 | 323.26 |
40 | 125 | 500 | 88.94 | 89.05 | 1420 | 1400 |
40 | 500 | 1 | 2244.22 | 2126.54 | 228.45 | 253.18 |
40 | 500 | 3 | 2336.36 | 2416.46 | 222.33 | 215.54 |
40 | 500 | 10 | 2221.78 | 2366.41 | 232.82 | 219.07 |
40 | 500 | 50 | 2381.99 | 2326.66 | 216.49 | 219.06 |
40 | 500 | 100 | 1576.7 | 1592.65 | 324.74 | 324.52 |
40 | 500 | 500 | 382.75 | 367.4 | 1410 | 1410 |
Thank you @yar-shukan for doing through analysis.
Task.Run
was suspected already few times and it sure does put spiky load on the threadpool, so the real value is to control bustability. Assuming that the perf wise there is no-impact, I am inclined towards removing Task.Run
(i.e. commit this PR) as there are no known side-affects. Thoughts?
@kirankumarkolli I couldn't really see any difference between 2 of approaches: I see both ThreadPool Thread Count
and ThreadPool Queue Length
behave same and ThreadPool Thread Count
in my test did not go ever above SemaphoreSlim
initialCount
parameter (I used 40 in my tests simulating Environment.ProcessorCount=4
case in codebase). I don't fully understand why though.
In theory the version without Task.Run should be more lightweight, but I could not proof it running test above.
@stephentoub your check and comment on this case would be very helpful. thanks!
why code doesn't do this instead
If I'm understanding the question right, one reason such a Task.Run might be used is if the body of the work does a non-trivial amount of synchronous work before getting to anything that might await. Imagine it was this:
List<Task> tasks = new();
for (int i = 0; i < 100; i++)
{
tasks.Add(ProcessAsync(i));
}
static async Task ProcessAsync(int i)
{
Thread.Sleep(1_000); // representing some amount of work
await SomethingElse();
}
With this scheme, that SpinWait in ProcessAsync is part of the synchronous call and thus is part of the loop, e.g. the 99th iteration of the loop won't even call ProcessAsync for over a minute, because the 99 iterations that came before it all did at least a second's worth of work. If instead it's:
List<Task> tasks = new();
for (int i = 0; i < 100; i++)
{
int iter = i;
tasks.Add(Task.Run(() => ProcessAsync(iter));
}
static async Task ProcessAsync(int i)
{
Thread.Sleep(1_000); // representing some amount of work
await SomethingElse();
}
now the only work the loop itself is doing is queueing 100 work items, such that none of those work items is at all dependent on any other iteration's up-front work completing before it's invoked.
I don't know if that's the case here, but that's the primary reason someone would opt to use a Task.Run in a case like this.
@stephentoub , in case of this particular code it doesn't seem to do heavy CPU (see details here): it does some StringBuilder
stuff, but it doesn't look heavy operation.
Considering that this code might be executed 100 times in loop (e.g. the CosmosDB collection partitions count = 100) my concern was that it would need thread pool threads instead of doing this on main calling thread and it can cause thread pool thread starvation.
But running simulating benchmark (I used for loop with 100 items, see example above) I couldn't really spot a difference between both versions of the code with async method or Task.Run in terms of thread pool usage. That's the main part I am trying to understand why?
Having that I am not sure if we really need to do anything with this Task.Run and use async yielded methods instead.
and it can cause thread pool thread starvation.
Why would it cause starvation? Is this thread synchronously blocking on something after kicking off those work items?
Starvation probably is not the right word here: assumption was that Task.Run version would use thread pool heavier compared to code that does not use Task.Run and uses Tasks from async methods instead which execute on calling thread until place where there's await yield on I/O operation call (down the call stack) or await on SemaphoreSlim.WaitAsync() call.
It will queue a work item. That doesn't necessarily translate into heavier use of the pool, though. For example, there's a good chance this code is already running on the thread pool, in which case that queued task will end up going into the thread's local queue, and if there's no other thread available to pick up that work, this thread will just process it the next time it yields back to the thread pool's dispatch loop.
Describe the bug
ReadManyTaskHelperAsync
code has suspicious Task.Run call that looks as a potential issue:Task.Run
schedules task on thread pool's thread and potentially can cause thread pool being exhausted.This
Task.Run
call looks unnecessary and instead of wrapping async code intoTask.Run
(that can put higher pressure on thread pool)ReadManyTaskHelperAsync -> for loop -> tasks.Add(Task.Run(async () =>
why code doesn't do this instead:ReadManyTaskHelperAsync -> for loop -> tasks.Add(DoStuffAsync(params, semaphore))
To Reproduce Run ReadManyTaskHelperAsync
Expected behavior
Task.Run
is not called inReadManyItemsAsync
execution. Thread pool is not used.Actual behavior
Task.Run
is called inReadManyItemsAsync
execution. Thread pool is used.Environment summary SDK Version: 3.39.1 OS Version: Windows
Additional context There was potential issue reported here but it was not really proven that issue came out of this method call and reporter eventually mitigated with other solution.
cc @ealsur