Open nbcks opened 9 months ago
Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.
Author: | nbcks |
---|---|
Assignees: | - |
Labels: | `area-System.IO` |
Milestone: | - |
I copied your code unmodified, and tested on my machine with different drives.
On NVMe drive C: loop 00:00:00.0717894 after when all 00:00:00.2501430
On HDD D: loop 00:00:09.3517668 after when all 00:00:09.3519552
On SATA SSD S: loop 00:00:02.3328987 after when all 00:00:02.3331167
Observing the directory with explorer, files are appearing sequentially.
It's indeed running asynchronously, but seems that it sometimes starts with processing sequentially, and after a while it starts to process in parallel.
You can try this code:
static async Task SimpleParallelWriteByteAsync()
{
var rootFilePath = "C:/temp/asynctest2";
var fileSize = 1024 * 1024 * 100;
if (Directory.Exists(rootFilePath))
Directory.Delete(rootFilePath, true);
Directory.CreateDirectory(rootFilePath);
var buffer = new byte[fileSize];
new Random().NextBytes(buffer);
IList<Task> writeTaskList = new List<Task>();
Stopwatch sw = Stopwatch.StartNew();
for (int index = 0; index <= 30; index += 1)
{
string fileName = $"file-{index:00}.txt";
string filePath = Path.Combine(rootFilePath, fileName);
writeTaskList.Add(Core(filePath));
}
async Task Core(string filePath)
{
using var handle = File.OpenHandle(filePath, FileMode.CreateNew, FileAccess.Write, FileShare.Read, FileOptions.Asynchronous);
Console.WriteLine("start");
await RandomAccess.WriteAsync(handle, buffer, 0).ConfigureAwait(false);
Console.WriteLine("done");
}
var timespanLoop = sw.Elapsed;
await Task.WhenAll(writeTaskList);
var timespanAfterWhenall = sw.Elapsed;
Console.WriteLine($"loop {timespanLoop} after when all {timespanAfterWhenall}");
}
await SimpleParallelWriteByteAsync();
The method Core
is exactly what File.WriteAllBytesAsync
does under the hood.
On my machine it produces:
start
done
start
done
start
done
start
done
start
done
start
done
start
done
start
done
start
done
start
done
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
start
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
done
loop 00:00:00.6775571 after when all 00:00:00.9536315
@hez2010 I can reproduce the same! Is there any way to tell whether this has happened or not? What is controlling whether this is async?
async
!= Parallel, and vice-versa.
Specifically with this behavior it's likely that your code is actually spinning up multiple threads to service the tasks, and as normal it just takes a little while for them to be allocated (there's a strategy for how fast that happens). You could try setting the min number of threads first and see if that improves things.
Note that theoretically a fully/ideal async
implementation would run all the tasks on one thread.....
Asynchronous execution means non-blocking execution. Parallel means running multiple things at the same time.
If you start n
async tasks there is no guarantee that they will be performed by n
threads at the same time. We need to fix these docs.
What exactly are you trying to achieve?
Asynchronous execution means non-blocking execution.
Doesn't the timing suggest it's blocking execution though?
00:00:05.6234609 after when all 00:00:05.6254815
The Tasks all completed at practically the same time as they were all enqueued.
async
!= Parallel, and vice-versa.Specifically with this behavior it's likely that your code is actually spinning up multiple threads to service the tasks, and as normal it just takes a little while for them to be allocated (there's a strategy for how fast that happens). You could try setting the min number of threads first and see if that improves things.
I think you are all missing the point. There aren't supposed to be any tasks being queued on the threadpool in the first place.
Overlapped IO means that the request is queued very quickly all the way into the kernel and eventually to the driver and disk (for DMA by the disk itself), and returns immediately. The callback is queued to an IO Completion Port, which is picked up by the IOCP wait thread, which marks the original Task as completed (this should be very fast). Actual dispatch of the Task continuation should only happen when you call Task.WhenAll
.
At least that's the theory, what happens is another question.
Note that theoretically a fully/ideal
async
implementation would run all the tasks on one thread.....
Yes, that's what you'd expect to happen if Overlapped was working properly. There is clearly something broken here.
@adamsitnik
Why is this tagged https://github.com/dotnet/runtime/labels/documentation? This has nothing to do with the docs, it's an outright bug report.
No threads are being (or should be) created here. it's just a straight Overlapped IO request, with the continuation callback posted to the (possibly single) IOCP thread.
Maybe this is due to how the underlying win32 APIs work? Have you tried whether they have the same issue?
Asynchronous execution means non-blocking execution. Parallel means running multiple things at the same time.
If you start
n
async tasks there is no guarantee that they will be performed byn
threads at the same time. We need to fix these docs.What exactly are you trying to achieve?
Nobody said that we expect threads to be created. You're misunderstanding the issue. We just expect the requests to be async (overlapped).
@adamsitnik This is still tagged incorrectly. This is a bug report, not a docs issue.
What File.WriteAllBytesAsync
does on every OS:
.NET can't open a file handle in async way, because none of the OSes that we support exposes an API for doing that. That is why we have not exposed methods like File.OpenAsync
: https://github.com/dotnet/runtime/issues/24698#issuecomment-767631566
Using Stopwatch
does not prove that it's not async, it just shows that for this particular configuration opening files is much faster on Linux than on Windows.
If you ever want to see if something is async or not on Windows you should use Concurrency Visualizer. It's a free plugin for VS that was design to help solve concurrency problems.
The write operation is 100% async on Windows and it's implemented here:
On Unix it's just a sync file IO operation scheduled to the ThreadPool
and executed by a thread pool thread.
And it's a documentation issue because the referenced doc states that:
The following examples demonstrate parallel processing by writing 10 text files.
While it does not perform that in parallel. It just starts 10 async operations that may be executed in parallel, but most likely won't all be executed in parallel at the same time.
Description
Calls to both new System.IO.FilesStream(... async: true) and Systeml.IO.ReadAllBytesAsync() both execute synchronously on windows. On Linux they work as expected. This behaviour was replicated on .NET 6, .NET 8 and .NET Framework 4.7.2 on windows and replicated to work correctly on linux on .NET 8. On Windows, this was tested on two machines:
The same behaviour was seen in both.
On linux, this was tested on a laptop with a local NVMe SSD.
The bug is very easy to replicate, with code below which is based on https://learn.microsoft.com/en-us/dotnet/csharp/asynchronous-programming/using-async-for-file-access#parallel-asynchronous-io
Reproduction Steps
Run this code:
Expected behavior
The expected behaviour is that the loop should finish very rapidly, as the main chunk of IO has not started yet, just been queued by the operating system. The bulk of the work should happen in the operating system IO system, and the C# program should spend most of the time in await Task.WhenAll. This is what happens on linux:
oop 00:00:00.0082008 after when all 00:00:03.0910144
Actual behavior
On several machines on windows, I got:
00:00:05.6234609 after when all 00:00:05.6254815
Note the the lack of time spent in Task.WhenAll, which implies the work is getting done synchronously. This is further confirmed that replacing the await with:
writeTaskList.Add(Task.Run(() => File.WriteAllBytesAsync(filePath, buffer)));
Indeed makes the difference appear with:
loop 00:00:00.0089046 after when all 00:00:01.2339927
Regression?
This appears to have never worked, with the same behaviour on .NET Framework 4.7.2, .NET 6 and .NET 8 so is not a regression.
Known Workarounds
Use Linux.
Configuration
Confirmed to not work on Windows Citrix VDI on .NET 8, .NET 6 and .NET Framework 4.7.2. Confirmed to not work on .NET 6 on a Windows Desktop. Both x64.
Confirmed to work on linux .NET 8.0 Debian 12 x64.
Other information
Originally I believed it might be a windows configuration. However I posted here https://stackoverflow.com/questions/78036177/async-io-executing-synchronously and it appears others replicate the same behaviour!