axboe / fio

Flexible I/O Tester
GNU General Public License v2.0
5.19k stars 1.26k forks source link

Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given #833

Closed tripped closed 4 years ago

tripped commented 4 years ago

I'm using fio to try to benchmark high-throughput, pipelined writes to networked storage on Windows, and I'm running into a slightly annoying issue - in order to actually get good performance, I have to pass --overwrite=1 -- which has a couple of small problems:

  1. It causes the entire file to be written twice (and the first time is rather slow), needlessly consuming time;
  2. It means that the test is measuring something slightly different from what is intended, since the write operations do not induce allocation.

On Windows, all appends get synchronized; this is true even when using unbuffered, overlapped IO. The way to get around this issue is to set the length of the file before issuing IO, generally by using the Win32 SetEndOfFile function. (Note the MS documentation I linked says to use SetFileValidData; I suspect this may be an error; using SetEndOfFile to create a sparse file seems to be the "right" way to do this, and this is what Windows Explorer does when copying a file.)

So, the behavior I want is for fio to extend the file sparsely before doing the write workload. I read extend_file (filesetup.c), and it does not appear to support this behavior right now - any time it truncates to extend the size of the file, it appears it will also write data to the entire file.

I'd like to craft a small PR to support this, but I figured I'd float the idea first since there are a couple different ways one might do it. My first thought was to just add a flag to the thread options, something like --extend-only-truncates (perhaps with a better name), which just changes the behavior of extend_file() to do the truncate but not the writes. On the other hand, maybe this could be something specific to the windowsaio engine, since that's the only place that I'm aware of where you'll see this behavior.

Any thoughts? Is there perhaps a better way to achieve this?

sitsofe commented 4 years ago

@tripped If you can cope with the file being recreated entirely and preallocated then I would suggest we do the following:

  1. Implement a fallocate path for Windows by doing [Create File]; SetFilePointerEx([...]); SetEndOfFile([...]). This will try ensure best contiguity and make the kernel zero the file file for us which should be faster than doing it ourselves. Update: It turns out this is a bad idea.
  2. Add a new option fallocate=insecure that does a [Create File]; SetFileValidData([...]) and bails out with a helpful error message about permissions if it fails. This will require the user to have appropriate permissions and will expose deleted data but will be extremely fast. For now such an option will be windows only.

But what I can't tell is if whether you actually want the file to start out sparse and that you actually WANT to include the overhead of "filling in" the holes! If you want the file to sparse we need to be explicit on Windows by doing something like: [Create file], DeviceIoControl( fHandle, FSCTL_SET_SPARSE, [...]); SetFilePointerEx([...]); DeviceIoControl( fHandle, FSCTL_SET_ZERO_DATA, [...]). If you want multiple holes in a file on Windows you have to explicitly say so otherwise you only get holes at the end of files due to seek extending.

There is a general issue regarding file extending when using overwrite=1. Arguably we should be trying to use fallocate style methods of extension (complete with fallbacks) to try our best to ensure that the new region is not sparse. Then we can assume that if someone really wants fio to work with sparse files they can make such files at the correct size in advance and use overwrite=1. Having said that, if the previous suggestions are good enough we can skirt this issue for now.

sitsofe commented 4 years ago

So it turns out my idea 1. ([Create File]; SetFilePointerEx([...]); SetEndOfFile([...])) has a flaw: if you do a write to the middle of a file created this way then Windows will stall the write while it backfills the the file up to that point with zeros. To counter this you would have to follow up with a write at the end to force any zero filling to happen at layout time (which half defeats the reason we use fallocate). The post https://superuser.com/a/274867/ from MS dev Larry Osterman says that using SetEndOfFile gives a hint to the filesystem for optimal layout (so long as it's done before the first write) which makes it sound like this is most useful when combined with layout writes that grow the file to its final size. Given this it seems like only the insecure expose old data technique (2.) would actually speed up initial layout (@axboe any thoughts on this)?

There might be another way to preallocate on Windows when the filesystem supports sparse files but if so no one talks about it. From https://docs.microsoft.com/en-gb/windows/win32/api/winioctl/ni-winioctl-fsctl_set_zero_data?redirectedfrom=MSDN :

If you use the WriteFile function to write zeros (0) to a sparse file, the file system allocates disk space for the data that you are writing. If you use the FSCTL_SET_ZERO_DATA control code to write zeros (0) to a sparse file and the zero (0) region is large enough, the file system may not allocate disk space.

The question is whether those zeros actually get written to disk (it's not that clear whether that happens).

sitsofe commented 4 years ago

@tripped any thoughts?

tripped commented 4 years ago

@sitsofe Ah, sorry if I was unclear! The behavior I want is pretty much just to emulate the behavior of Windows copy: CreateFile, SetFilePointer, SetEndOfFile, SetFilePointer back to 0, then a bunch of sequential WriteFile calls. Hence my initial suggestion of a "truncate-only" flag or something similar, since on Windows the truncate call ends up just calling SetEndOfFile.

I wasn't aware of the backfill issue, that's really interesting. So it seems like there are a few different ways of going about extending a file on Windows:

  1. SetEndOfFile. Will incur a backfill if you write non-sequentially. What Windows copy does.
  2. Make the file sparse with FSCTL_SET_SPARSE/FSCTL_SET_ZERO_DATA. No problem with backfill, but like option 1, no blocks are allocated for the file initially. For, e.g., SMB servers, also depends on the underlying storage supporting these FSCTLs.
  3. SetFileValidData - more analogous to POSIX fallocate, allocates space for the file but does not require zeroing or transferring data. Also depends on the storage supporting certain control codes.
  4. Actually just write out the full file size worth of data. The most portable option. :-)

(4) is the current behavior of fio with --overwrite=1. What I'd love is a way to get it to behave like (1) - the motivation being to get Windows to stop synchronizing the writes, while still stressing the full write path of the storage (including allocation).

Gt3pccb commented 4 years ago

@tripped, this has been an issue with FIO and I really appreciate you presenting the case so eloquently. I use FIO quite a bit to test CSVs, Spaces and ReFS and having to overwrite not only takes a long time but also the nature of an overwrite-Data in-place writes are not always desirable. Let me know if there is anything I can help with testing.

Gt3pccb commented 4 years ago

@sitsofe, I tried your suggestion and it did not make any changes.

sitsofe commented 4 years ago

@Gt3pccb which suggestion was that? Maybe you mean to post this reply to a different issue?

@tripped I've come around to your suggestion. What do you think of a introducing a fallocate=truncate mode (perhaps with a better name - perhaps fallocate=seek?) that does a truncate on *nixes and a SetFilePointerEx/SetEndOfFile on Windows?

tripped commented 4 years ago

@sitsofe that sounds like it would work quite well for my purposes! I vaguely wonder if fallocate=truncate is misleading or contradictory, though, since (at least as I understand it), the purpose of fallocate is to allocate blocks for a file, and the purpose of truncate is to not allocate blocks for the file. ;-)

Maybe that's not super important though? I'd be happy to put together a PR for it; I've been tinkering around in that area and made a prototype I could clean up.

Gt3pccb commented 4 years ago

@Trip, have you looked at why the FIO specified queue depth does not get translated into a physical disk queue depth when we are doing first allocations? With overwrites FIO queued depth = physical disk queue depth. We tried to look at why this happens but without FIO’s symbols we weren’t able to. Do we have symbols/PDBs for FIO that I can use in KD or Xperf? Thanks

Gt3pccb commented 4 years ago

I also noticed that the queue depth indicated in FIO does not translate into PHY disk queue depth when first allocations are happening. I would need the symbols for FIO to troubleshoot further because our stack appears to be waiting

Gt3pccb commented 4 years ago

@Trip Can you share your binaries to see if they help on my environment? Thanks

sitsofe commented 4 years ago

@Gt3pccb note that the desired queue depth specified via iodepth need not be the number that is actually achieved. You would need to check fio's output (e.g. IO depths) to know what was at least sent down (see https://fio.readthedocs.io/en/latest/fio_doc.html#interpreting-the-output for some more detail).

Gt3pccb commented 4 years ago

@sitsofe these are my observations while testing the next gen of PCIe Gen 4 NvMe Rulers. Each ruler is capable of 3.5GBps. We are using 6 of these rulers in RAID0 using DDR4 X 8

Using live KD: With first allocations using --iodepth=256 inflight ios =between 1 and 2 which matches Fio's IO depths : 1=98.2%, 2=1.7%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% With in data in place allocations (i.e. overwrites) --iodepth=256 inflight ios +/- 256 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%

Using other perf tools (not as flexible as FIO) we do not see this behavior. We would need the FIO's symbols for windows so we RCA.

I also tested FIO with sparse files (we inserted a filter driver in the windows stack that intercepts IOs, and creates a sparse file similar to this process

fsutil file createNew d:\DataData\2\900GB.SPARSE 0 fsutil sparse setFlag d:\DataData\2\900GB.SPARSE 1 fsutil file setEOF d:\DataData\2\900GB.SPARSE 966367641600

and did not change the behavior.

If you would like me to RCA this issue can you please point me to FIO's symbols for windows/ thanks

sitsofe commented 4 years ago

@Gt3pccb So we're getting a little off-topic but you would have to go out of your way to have an fio built without symbols by default (we don't strip them and they default to on). The snag is likely that they are dwarf symbols (which work fine when using gdb) rather than Windows PDB (see https://stackoverflow.com/questions/19269350/how-to-generate-pdb-files-while-building-library-using-mingw for more data).

tripped commented 4 years ago

I also noticed that the queue depth indicated in FIO does not translate into PHY disk queue depth when first allocations are happening. I would need the symbols for FIO to troubleshoot further because our stack appears to be waiting

@Gt3pccb - I'm not sure what you mean exactly; I think it's expected (?) that the initial layout is just a series of sequential writes. If you look at the code in extend_file that does this, it's just calling write in a loop: https://github.com/axboe/fio/blob/master/filesetup.c#L200

Gt3pccb commented 4 years ago

FIO is the only of the 3 different perf tools that behaves this way.

With CFfsTest (internal tool) and DiskSpeed (public) queues do translate into Phy Disk even in first allocation.

Once FIO lays the file then and only then FIO queues get translated into Phy Disk queues.

From: Trip Volpe notifications@github.com Sent: Tuesday, October 29, 2019 3:33 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)

I also noticed that the queue depth indicated in FIO does not translate into PHY disk queue depth when first allocations are happening. I would need the symbols for FIO to troubleshoot further because our stack appears to be waiting

@Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579649672&sdata=jV1AHobtZfhA1iuLYuNtujDDdErYUUSD%2FqJmTsQLlns%3D&reserved=0 - I'm not sure what you mean exactly; I think it's expected (?) that the initial layout is just a series of sequential writes. If you look at the code in extend_file that does this, it's just calling write in a loop: https://github.com/axboe/fio/blob/master/filesetup.c#L200https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fblob%2Fmaster%2Ffilesetup.c%23L200&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579649672&sdata=6w6%2BzqgH8WbwKe83eFi6m8dntZYRRqA5ETxTKqJpxdM%3D&reserved=0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBCYAKQIUFNTPR3UHS4DQRC2YHA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSJVXA%23issuecomment-547658460&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579659663&sdata=32DrtpKSBvDDSqDf9na2pCs%2BRyeYshnZSXpcE3bIy6w%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBCZYSNOPU2UVZHTGSULQRC2YHANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579659663&sdata=M5VstruHETAyXq5xX1kqp2d7OHndzG7HMYBfz0uMdwE%3D&reserved=0.

sitsofe commented 4 years ago

So I was all set to dismiss @Gt3pccb out of hand but I've just done the following:

./fio --name=layout --filename='D\:\fio.tmp' --eta=always --size=5G --bs=4k --iodepth=16 --rw=write --number_ios=1
fsutil.exe file layout D:\fio.tmp

and it said only one cluster had been allocated so @Gt3pccb is absolutely correct!

I then remembered that when operating on a write only job fio doesn't bother to allocate the file up-front because what's the point (except of course there is a point in certain circumstances)? See https://github.com/axboe/fio/blob/8c302eb9706963e07d6d79998e15bede77b94520/filesetup.c#L119 and this also explains why ovewrite=1 helps (because it forces layout to happen).

(at this stage I will note that Diskspd roughly does 1. followed by 3. if it can otherwise 4. from @tripped's options)

If Windows fio gets a fallocate then this issue will be bypassed (assuming clusters actually get allocated)... Thoughts?

Gt3pccb commented 4 years ago

Thanks, I will give the process a try.

I can tell you what appears to be the difference among the 3 different tools CfStest and DiskSpeed create the file, set the eof and valid data. We leverage NtCreateFile. DiskSpeed and CFsTes they append, while FIO appears to extend.

Does this make sense to you?

Thanks Astolfo

From: Sitsofe Wheeler notifications@github.com Sent: Monday, October 28, 2019 10:43 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)

@Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797513297&sdata=wdrbCxm0rLrtTaQhc5y0WiAo6w8%2BTI6DdDsxRJcaB0U%3D&reserved=0 So we're getting a little off-topic but you would have to go out of your way to have an fio built without symbols by default (we don't strip them and they default to on). The snag is likely that they are dwarf symbols (which work fine when using gdb) rather than Windows PDB (see https://stackoverflow.com/questions/19269350/how-to-generate-pdb-files-while-building-library-using-mingwhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F19269350%2Fhow-to-generate-pdb-files-while-building-library-using-mingw&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797523297&sdata=jO2XHFhIVO7jhMI0yucTjj3um7uIoZZdCniwunQxMNs%3D&reserved=0 for more data).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBCZV5FBYHXSPFMSXTJTQQ7EODA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECPJZLI%23issuecomment-547265709&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797523297&sdata=3sFElSFQun7Q%2BUdH1hq%2BnqSfcyDx0AHsRqcGP%2FUeiNs%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBC4AP7BJG64SA5WCTGLQQ7EODANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797533284&sdata=7kePSzJS8GBojQdgKE8MDbd5II7lAudXe3nh783PpSk%3D&reserved=0.

Gt3pccb commented 4 years ago

I am glad you did not dismiss me!

I tried the overwrite=1 and makes not much difference in the sense that you will “lay” the file as extend instead of append.

There is a significant difference between first allocations and overwrites at the file system level.

My point is that as for performance goes first allocations are as fast as overwrites, however with FIO first allocations are magnitudes slower than with other testing apps.

I am now supporting the file system team and we are focusing on the performance of file systems for better than what PCIe Gen 4 will give us in file systems larger than 10PB and we need to focus on for now on first allocation performance.

I hope this makes sense as why I need this resolved.

thanks

From: Sitsofe Wheeler notifications@github.com Sent: Tuesday, October 29, 2019 10:23 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)

So I was all set to dismiss @Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068835475&sdata=j7UbjI1TZOnIKFxPPFfsTg1S1I5p0uvlg07jnry%2FPBI%3D&reserved=0 out of hand but I've just done the following:

./fio --name=layout --filename='D\:\fio.tmp' --eta=always --size=5G --bs=4k --iodepth=16 --rw=write --number_ios=1

fsutil.exe file layout D:\fio.tmp

and it said only one cluster had been allocated so @Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068835475&sdata=j7UbjI1TZOnIKFxPPFfsTg1S1I5p0uvlg07jnry%2FPBI%3D&reserved=0 is absolutely correct!

I then remembered that when operating on a write only job fio doesn't bother to allocate the file up-front because what's the point (except of course there is a point in certain circumstances)? See https://github.com/axboe/fio/blob/8c302eb9706963e07d6d79998e15bede77b94520/filesetup.c#L119https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fblob%2F8c302eb9706963e07d6d79998e15bede77b94520%2Ffilesetup.c%23L119&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068845470&sdata=%2BeNLkHnutMcPbOkyHyS2LYLEA7v4b7l739isj3Mg4Is%3D&reserved=0 and this also explains why ovewrite=1 helps (because it forces layout to happen).

If Windows gets a fallocate then this issue will be bypassed... Thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBC6IUZCU7CCZWXJNZVLQREK4ZA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECS6OJI%23issuecomment-547743525&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068845470&sdata=osoAjbhCYTpY1SPHGI3loIXWtI%2Bk540pMUBmf2QgG74%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBC3Q52YMLB3D5673DP3QREK4ZANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068855463&sdata=Fdc6jwGmK12DRHz0mgFkLilJHSLKBaMW2d%2FmkKy45po%3D&reserved=0.

tripped commented 4 years ago

Just opened a PR based on @sitsofe's suggestion to add an fallocate=truncate option. It's a fairly small change, but I believe this is all that's necessary to address my problems with iodepth on Windows.

@Gt3pccb I think your issue is at least potentially related to mine - do you think this change will help you as well? If I understand correctly if you just pass --fallocate=truncate you'll get what you want on Windows - an initial SetEndOfFile followed by allocating writes which are pipelined.

Gt3pccb commented 4 years ago

@ Trip you are talking about the file system and I am not familiar with this layer, I work on hardware microcode. CFS test and Diskspd call the NtCreateFile, which appears to set the EOF and validdata (or something similar to fsutil) and then they append, while FIO appears to (not set the EOF which implies no valid data) extend

As for –fallocate=truncate it looks like FIO does not like it fio --thread --direct=1 --ioengine=windowsaio --offset=0 --filesize=100GB --rw=write --iodepth=256 --bs=256K --nrfiles=1 --name=RandomSize --numjobs=1 - --directory=v\:\ --fallocate=truncate Option : Your platform does not support fallocate Bad option

From: Trip Volpe notifications@github.com Sent: Tuesday, November 5, 2019 4:14 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)

Just opened a PR based on @sitsofehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsitsofe&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584448110&sdata=84M5KtCle5NRQA4WthNuCR5nHjMTRRcXEYWO%2FHoYABY%3D&reserved=0's suggestion to add an fallocate=truncate option. It's a fairly small change, but I believe this is all that's necessary to address my problems with iodepth on Windows.

@Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584448110&sdata=NPfu%2BPOxqGqfJ82s6guuk%2BTWAhQlGjbPe%2FHilbpRMLs%3D&reserved=0 I think your issue is at least potentially related to mine - do you think this change help you as well? If I understand correctly if you just pass --fallocate=truncate you'll get what you want on Windows - an initial SetEndOfFile followed by allocating writes which are pipelined.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBC6YYR5HL3ISB2ED4ULQSID5RA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDEZOCQ%23issuecomment-550082314&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584458111&sdata=Rz%2BRL1E%2F2mmHKBAqmPGAkOGhhIs3ENdcjK04A0OD31o%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBCZEFB7LVQAV33RSWB3QSID5RANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584458111&sdata=uI5wLx%2BTKBh1B2ICqKOlUkDq%2Bjmyoeo1YXAUkfNbTIs%3D&reserved=0.

tripped commented 4 years ago

As for –fallocate=truncate it looks like FIO does not like it fio --thread --direct=1 --ioengine=windowsaio --offset=0 --filesize=100GB --rw=write --iodepth=256 --bs=256K --nrfiles=1 --name=RandomSize --numjobs=1 - --directory=v\:\ --fallocate=truncate Option : Your platform does not support fallocate Bad option

Ah, sorry - I meant that in reference to my pull request (https://github.com/axboe/fio/pull/859) which adds that option.

extratype commented 3 years ago

Is -fallocate=truncate working as intended on Windows? ftruncate() is _chsize() in mingw-w64, and _chsize() extends the file by zero-filling it in synchronous, buffered I/O:

Process Monitor:
17:34:00.2624323  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 4,096, Length: 4,096
17:34:00.2624870  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 16,384, Length: 4,096
17:34:00.2625429  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 32,768, Length: 4,096
17:34:00.2627554  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 65,536, Length: 4,096
....
17:34:08.0918914  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 2,147,155,968, Length: 4,096
17:34:08.0921128  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 2,147,221,504, Length: 4,096
17:34:08.0923003  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 2,147,287,040, Length: 4,096
17:34:08.0925019  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 2,147,352,576, Length: 4,096
17:34:08.0926989  test.exe  FASTIO_WRITE  D:\a.out  FAST IO DISALLOWED  Offset: 2,147,418,112, Length: 4,096

Stack:
0  FLTMGR.SYS  FltpPerformPreCallbacksWorker + 0x36b
1  FLTMGR.SYS  FltpPassThroughFastIo + 0xc0
2  FLTMGR.SYS  FltpFastIoWrite + 0x165
3  ntoskrnl.exe  NtWriteFile + 0x43d
4  ntoskrnl.exe  KiSystemServiceCopyEnd + 0x28
5  ntdll.dll  NtWriteFile + 0x14
6  KernelBase.dll  WriteFile + 0x76
7  ucrtbase.dll  write_binary_nolock + 0x52
8  ucrtbase.dll  _write_nolock + 0xb2
9  ucrtbase.dll  chsize_nolock + 0xb0
10  ucrtbase.dll  __crt_seh_guarded_call<int>::operator()<<lambda_63911f43cc86614963252b1f423aad40>,<lambda_522ac1b1e1a1180d6f374a882b81d1c8> &,<lambda_6da5eaf298b4749753c721cc8c61e051> > + 0x6f
11  ucrtbase.dll  chsize_s + 0xa4
12  ucrtbase.dll  chsize + 0xc
13  test.exe  wmain + 0x48, D:\test\main.cpp(254)
14  test.exe  __scrt_common_main_seh + 0x10c, d:\agent\_work\4\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(281)

So this behavior matches with (4), not (1).

I wasn't aware of the backfill issue, that's really interesting. So it seems like there are a few different ways of going about extending a file on Windows:

  1. SetEndOfFile. Will incur a backfill if you write non-sequentially. What Windows copy does.
  2. Make the file sparse with FSCTL_SET_SPARSE/FSCTL_SET_ZERO_DATA. No problem with backfill, but like option 1, no blocks are allocated for the file initially. For, e.g., SMB servers, also depends on the underlying storage supporting these FSCTLs.
  3. SetFileValidData - more analogous to POSIX fallocate, allocates space for the file but does not require zeroing or transferring data. Also depends on the storage supporting certain control codes.
  4. Actually just write out the full file size worth of data. The most portable option. :-)
sitsofe commented 3 years ago

@extratype That's a fair point - would you like to open a separate issue for that?