Closed tripped closed 4 years ago
@tripped If you can cope with the file being recreated entirely and preallocated then I would suggest we do the following:
fallocate
path for Windows by doing [Create File]; SetFilePointerEx([...]); SetEndOfFile([...])
. This will try ensure best contiguity and make the kernel zero the file file for us which should be faster than doing it ourselves. Update: It turns out this is a bad idea.fallocate=insecure
that does a [Create File]; SetFileValidData([...])
and bails out with a helpful error message about permissions if it fails. This will require the user to have appropriate permissions and will expose deleted data but will be extremely fast. For now such an option will be windows only.But what I can't tell is if whether you actually want the file to start out sparse and that you actually WANT to include the overhead of "filling in" the holes! If you want the file to sparse we need to be explicit on Windows by doing something like: [Create file], DeviceIoControl( fHandle, FSCTL_SET_SPARSE, [...]); SetFilePointerEx([...]); DeviceIoControl( fHandle, FSCTL_SET_ZERO_DATA, [...])
. If you want multiple holes in a file on Windows you have to explicitly say so otherwise you only get holes at the end of files due to seek extending.
There is a general issue regarding file extending when using overwrite=1
. Arguably we should be trying to use fallocate
style methods of extension (complete with fallbacks) to try our best to ensure that the new region is not sparse. Then we can assume that if someone really wants fio to work with sparse files they can make such files at the correct size in advance and use overwrite=1
. Having said that, if the previous suggestions are good enough we can skirt this issue for now.
So it turns out my idea 1. ([Create File]; SetFilePointerEx([...]); SetEndOfFile([...])
) has a flaw: if you do a write to the middle of a file created this way then Windows will stall the write while it backfills the the file up to that point with zeros. To counter this you would have to follow up with a write at the end to force any zero filling to happen at layout time (which half defeats the reason we use fallocate
). The post https://superuser.com/a/274867/ from MS dev Larry Osterman says that using SetEndOfFile
gives a hint to the filesystem for optimal layout (so long as it's done before the first write) which makes it sound like this is most useful when combined with layout writes that grow the file to its final size. Given this it seems like only the insecure expose old data technique (2.) would actually speed up initial layout (@axboe any thoughts on this)?
There might be another way to preallocate on Windows when the filesystem supports sparse files but if so no one talks about it. From https://docs.microsoft.com/en-gb/windows/win32/api/winioctl/ni-winioctl-fsctl_set_zero_data?redirectedfrom=MSDN :
If you use the WriteFile function to write zeros (0) to a sparse file, the file system allocates disk space for the data that you are writing. If you use the FSCTL_SET_ZERO_DATA control code to write zeros (0) to a sparse file and the zero (0) region is large enough, the file system may not allocate disk space.
The question is whether those zeros actually get written to disk (it's not that clear whether that happens).
@tripped any thoughts?
@sitsofe Ah, sorry if I was unclear! The behavior I want is pretty much just to emulate the behavior of Windows copy: CreateFile, SetFilePointer, SetEndOfFile, SetFilePointer back to 0, then a bunch of sequential WriteFile calls. Hence my initial suggestion of a "truncate-only" flag or something similar, since on Windows the truncate call ends up just calling SetEndOfFile.
I wasn't aware of the backfill issue, that's really interesting. So it seems like there are a few different ways of going about extending a file on Windows:
(4) is the current behavior of fio with --overwrite=1. What I'd love is a way to get it to behave like (1) - the motivation being to get Windows to stop synchronizing the writes, while still stressing the full write path of the storage (including allocation).
@tripped, this has been an issue with FIO and I really appreciate you presenting the case so eloquently. I use FIO quite a bit to test CSVs, Spaces and ReFS and having to overwrite not only takes a long time but also the nature of an overwrite-Data in-place writes are not always desirable. Let me know if there is anything I can help with testing.
@sitsofe, I tried your suggestion and it did not make any changes.
@Gt3pccb which suggestion was that? Maybe you mean to post this reply to a different issue?
@tripped I've come around to your suggestion. What do you think of a introducing a fallocate=truncate
mode (perhaps with a better name - perhaps fallocate=seek
?) that does a truncate on *nixes and a SetFilePointerEx/SetEndOfFile on Windows?
@sitsofe that sounds like it would work quite well for my purposes! I vaguely wonder if fallocate=truncate
is misleading or contradictory, though, since (at least as I understand it), the purpose of fallocate is to allocate blocks for a file, and the purpose of truncate is to not allocate blocks for the file. ;-)
Maybe that's not super important though? I'd be happy to put together a PR for it; I've been tinkering around in that area and made a prototype I could clean up.
@Trip, have you looked at why the FIO specified queue depth does not get translated into a physical disk queue depth when we are doing first allocations? With overwrites FIO queued depth = physical disk queue depth. We tried to look at why this happens but without FIO’s symbols we weren’t able to. Do we have symbols/PDBs for FIO that I can use in KD or Xperf? Thanks
I also noticed that the queue depth indicated in FIO does not translate into PHY disk queue depth when first allocations are happening. I would need the symbols for FIO to troubleshoot further because our stack appears to be waiting
@Trip Can you share your binaries to see if they help on my environment? Thanks
@Gt3pccb note that the desired queue depth specified via iodepth
need not be the number that is actually achieved. You would need to check fio's output (e.g. IO depths
) to know what was at least sent down (see https://fio.readthedocs.io/en/latest/fio_doc.html#interpreting-the-output for some more detail).
@sitsofe these are my observations while testing the next gen of PCIe Gen 4 NvMe Rulers. Each ruler is capable of 3.5GBps. We are using 6 of these rulers in RAID0 using DDR4 X 8
Using live KD: With first allocations using --iodepth=256 inflight ios =between 1 and 2 which matches Fio's IO depths : 1=98.2%, 2=1.7%, 4=0.1%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% With in data in place allocations (i.e. overwrites) --iodepth=256 inflight ios +/- 256 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=100.0%
Using other perf tools (not as flexible as FIO) we do not see this behavior. We would need the FIO's symbols for windows so we RCA.
I also tested FIO with sparse files (we inserted a filter driver in the windows stack that intercepts IOs, and creates a sparse file similar to this process
fsutil file createNew d:\DataData\2\900GB.SPARSE 0 fsutil sparse setFlag d:\DataData\2\900GB.SPARSE 1 fsutil file setEOF d:\DataData\2\900GB.SPARSE 966367641600
and did not change the behavior.
If you would like me to RCA this issue can you please point me to FIO's symbols for windows/ thanks
@Gt3pccb So we're getting a little off-topic but you would have to go out of your way to have an fio built without symbols by default (we don't strip them and they default to on). The snag is likely that they are dwarf symbols (which work fine when using gdb) rather than Windows PDB (see https://stackoverflow.com/questions/19269350/how-to-generate-pdb-files-while-building-library-using-mingw for more data).
I also noticed that the queue depth indicated in FIO does not translate into PHY disk queue depth when first allocations are happening. I would need the symbols for FIO to troubleshoot further because our stack appears to be waiting
@Gt3pccb - I'm not sure what you mean exactly; I think it's expected (?) that the initial layout is just a series of sequential writes. If you look at the code in extend_file
that does this, it's just calling write
in a loop: https://github.com/axboe/fio/blob/master/filesetup.c#L200
FIO is the only of the 3 different perf tools that behaves this way.
With CFfsTest (internal tool) and DiskSpeed (public) queues do translate into Phy Disk even in first allocation.
Once FIO lays the file then and only then FIO queues get translated into Phy Disk queues.
From: Trip Volpe notifications@github.com Sent: Tuesday, October 29, 2019 3:33 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)
I also noticed that the queue depth indicated in FIO does not translate into PHY disk queue depth when first allocations are happening. I would need the symbols for FIO to troubleshoot further because our stack appears to be waiting
@Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579649672&sdata=jV1AHobtZfhA1iuLYuNtujDDdErYUUSD%2FqJmTsQLlns%3D&reserved=0 - I'm not sure what you mean exactly; I think it's expected (?) that the initial layout is just a series of sequential writes. If you look at the code in extend_file that does this, it's just calling write in a loop: https://github.com/axboe/fio/blob/master/filesetup.c#L200https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fblob%2Fmaster%2Ffilesetup.c%23L200&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579649672&sdata=6w6%2BzqgH8WbwKe83eFi6m8dntZYRRqA5ETxTKqJpxdM%3D&reserved=0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBCYAKQIUFNTPR3UHS4DQRC2YHA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECSJVXA%23issuecomment-547658460&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579659663&sdata=32DrtpKSBvDDSqDf9na2pCs%2BRyeYshnZSXpcE3bIy6w%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBCZYSNOPU2UVZHTGSULQRC2YHANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7C46e7f57f726643c25fb108d75cbfe637%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079851579659663&sdata=M5VstruHETAyXq5xX1kqp2d7OHndzG7HMYBfz0uMdwE%3D&reserved=0.
So I was all set to dismiss @Gt3pccb out of hand but I've just done the following:
./fio --name=layout --filename='D\:\fio.tmp' --eta=always --size=5G --bs=4k --iodepth=16 --rw=write --number_ios=1
fsutil.exe file layout D:\fio.tmp
and it said only one cluster had been allocated so @Gt3pccb is absolutely correct!
I then remembered that when operating on a write only job fio doesn't bother to allocate the file up-front because what's the point (except of course there is a point in certain circumstances)? See https://github.com/axboe/fio/blob/8c302eb9706963e07d6d79998e15bede77b94520/filesetup.c#L119 and this also explains why ovewrite=1
helps (because it forces layout to happen).
(at this stage I will note that Diskspd roughly does 1. followed by 3. if it can otherwise 4. from @tripped's options)
If Windows fio gets a fallocate
then this issue will be bypassed (assuming clusters actually get allocated)... Thoughts?
Thanks, I will give the process a try.
I can tell you what appears to be the difference among the 3 different tools CfStest and DiskSpeed create the file, set the eof and valid data. We leverage NtCreateFile. DiskSpeed and CFsTes they append, while FIO appears to extend.
Does this make sense to you?
Thanks Astolfo
From: Sitsofe Wheeler notifications@github.com Sent: Monday, October 28, 2019 10:43 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)
@Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797513297&sdata=wdrbCxm0rLrtTaQhc5y0WiAo6w8%2BTI6DdDsxRJcaB0U%3D&reserved=0 So we're getting a little off-topic but you would have to go out of your way to have an fio built without symbols by default (we don't strip them and they default to on). The snag is likely that they are dwarf symbols (which work fine when using gdb) rather than Windows PDB (see https://stackoverflow.com/questions/19269350/how-to-generate-pdb-files-while-building-library-using-mingwhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F19269350%2Fhow-to-generate-pdb-files-while-building-library-using-mingw&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797523297&sdata=jO2XHFhIVO7jhMI0yucTjj3um7uIoZZdCniwunQxMNs%3D&reserved=0 for more data).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBCZV5FBYHXSPFMSXTJTQQ7EODA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECPJZLI%23issuecomment-547265709&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797523297&sdata=3sFElSFQun7Q%2BUdH1hq%2BnqSfcyDx0AHsRqcGP%2FUeiNs%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBC4AP7BJG64SA5WCTGLQQ7EODANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7Caad9b5c79e7d418e9c5c08d75c32daa2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637079245797533284&sdata=7kePSzJS8GBojQdgKE8MDbd5II7lAudXe3nh783PpSk%3D&reserved=0.
I am glad you did not dismiss me!
I tried the overwrite=1 and makes not much difference in the sense that you will “lay” the file as extend instead of append.
There is a significant difference between first allocations and overwrites at the file system level.
My point is that as for performance goes first allocations are as fast as overwrites, however with FIO first allocations are magnitudes slower than with other testing apps.
I am now supporting the file system team and we are focusing on the performance of file systems for better than what PCIe Gen 4 will give us in file systems larger than 10PB and we need to focus on for now on first allocation performance.
I hope this makes sense as why I need this resolved.
thanks
From: Sitsofe Wheeler notifications@github.com Sent: Tuesday, October 29, 2019 10:23 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)
So I was all set to dismiss @Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068835475&sdata=j7UbjI1TZOnIKFxPPFfsTg1S1I5p0uvlg07jnry%2FPBI%3D&reserved=0 out of hand but I've just done the following:
./fio --name=layout --filename='D\:\fio.tmp' --eta=always --size=5G --bs=4k --iodepth=16 --rw=write --number_ios=1
fsutil.exe file layout D:\fio.tmp
and it said only one cluster had been allocated so @Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068835475&sdata=j7UbjI1TZOnIKFxPPFfsTg1S1I5p0uvlg07jnry%2FPBI%3D&reserved=0 is absolutely correct!
I then remembered that when operating on a write only job fio doesn't bother to allocate the file up-front because what's the point (except of course there is a point in certain circumstances)? See https://github.com/axboe/fio/blob/8c302eb9706963e07d6d79998e15bede77b94520/filesetup.c#L119https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fblob%2F8c302eb9706963e07d6d79998e15bede77b94520%2Ffilesetup.c%23L119&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068845470&sdata=%2BeNLkHnutMcPbOkyHyS2LYLEA7v4b7l739isj3Mg4Is%3D&reserved=0 and this also explains why ovewrite=1 helps (because it forces layout to happen).
If Windows gets a fallocate then this issue will be bypassed... Thoughts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBC6IUZCU7CCZWXJNZVLQREK4ZA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECS6OJI%23issuecomment-547743525&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068845470&sdata=osoAjbhCYTpY1SPHGI3loIXWtI%2Bk540pMUBmf2QgG74%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBC3Q52YMLB3D5673DP3QREK4ZANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7Cc2ccde7b55ef418642e608d75cf949cd%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637080098068855463&sdata=Fdc6jwGmK12DRHz0mgFkLilJHSLKBaMW2d%2FmkKy45po%3D&reserved=0.
Just opened a PR based on @sitsofe's suggestion to add an fallocate=truncate option. It's a fairly small change, but I believe this is all that's necessary to address my problems with iodepth on Windows.
@Gt3pccb I think your issue is at least potentially related to mine - do you think this change will help you as well? If I understand correctly if you just pass --fallocate=truncate you'll get what you want on Windows - an initial SetEndOfFile followed by allocating writes which are pipelined.
@ Trip you are talking about the file system and I am not familiar with this layer, I work on hardware microcode. CFS test and Diskspd call the NtCreateFile, which appears to set the EOF and validdata (or something similar to fsutil) and then they append, while FIO appears to (not set the EOF which implies no valid data) extend
As for –fallocate=truncate it looks like FIO does not like it
fio --thread --direct=1 --ioengine=windowsaio --offset=0 --filesize=100GB --rw=write --iodepth=256 --bs=256K --nrfiles=1 --name=RandomSize --numjobs=1 - --directory=v\:\ --fallocate=truncate
Option
From: Trip Volpe notifications@github.com Sent: Tuesday, November 5, 2019 4:14 PM To: axboe/fio fio@noreply.github.com Cc: Astolfo Rueda astolfor@microsoft.com; Mention mention@noreply.github.com Subject: Re: [axboe/fio] Poor windowsaio performance due to synchronization of writes unless --overwrite 1 is given (#833)
Just opened a PR based on @sitsofehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsitsofe&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584448110&sdata=84M5KtCle5NRQA4WthNuCR5nHjMTRRcXEYWO%2FHoYABY%3D&reserved=0's suggestion to add an fallocate=truncate option. It's a fairly small change, but I believe this is all that's necessary to address my problems with iodepth on Windows.
@Gt3pccbhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FGt3pccb&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584448110&sdata=NPfu%2BPOxqGqfJ82s6guuk%2BTWAhQlGjbPe%2FHilbpRMLs%3D&reserved=0 I think your issue is at least potentially related to mine - do you think this change help you as well? If I understand correctly if you just pass --fallocate=truncate you'll get what you want on Windows - an initial SetEndOfFile followed by allocating writes which are pipelined.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Faxboe%2Ffio%2Fissues%2F833%3Femail_source%3Dnotifications%26email_token%3DAMCJBC6YYR5HL3ISB2ED4ULQSID5RA5CNFSM4I5T7JQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDEZOCQ%23issuecomment-550082314&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584458111&sdata=Rz%2BRL1E%2F2mmHKBAqmPGAkOGhhIs3ENdcjK04A0OD31o%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAMCJBCZEFB7LVQAV33RSWB3QSID5RANCNFSM4I5T7JQA&data=02%7C01%7Castolfor%40microsoft.com%7C9761f74e05d54691598b08d7624e4328%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637085960584458111&sdata=uI5wLx%2BTKBh1B2ICqKOlUkDq%2Bjmyoeo1YXAUkfNbTIs%3D&reserved=0.
As for –fallocate=truncate it looks like FIO does not like it fio --thread --direct=1 --ioengine=windowsaio --offset=0 --filesize=100GB --rw=write --iodepth=256 --bs=256K --nrfiles=1 --name=RandomSize --numjobs=1 - --directory=v\:\ --fallocate=truncate Option
: Your platform does not support fallocate Bad option
Ah, sorry - I meant that in reference to my pull request (https://github.com/axboe/fio/pull/859) which adds that option.
Is -fallocate=truncate
working as intended on Windows? ftruncate()
is _chsize()
in mingw-w64, and _chsize()
extends the file by zero-filling it in synchronous, buffered I/O:
Process Monitor:
17:34:00.2624323 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 4,096, Length: 4,096
17:34:00.2624870 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 16,384, Length: 4,096
17:34:00.2625429 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 32,768, Length: 4,096
17:34:00.2627554 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 65,536, Length: 4,096
....
17:34:08.0918914 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 2,147,155,968, Length: 4,096
17:34:08.0921128 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 2,147,221,504, Length: 4,096
17:34:08.0923003 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 2,147,287,040, Length: 4,096
17:34:08.0925019 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 2,147,352,576, Length: 4,096
17:34:08.0926989 test.exe FASTIO_WRITE D:\a.out FAST IO DISALLOWED Offset: 2,147,418,112, Length: 4,096
Stack:
0 FLTMGR.SYS FltpPerformPreCallbacksWorker + 0x36b
1 FLTMGR.SYS FltpPassThroughFastIo + 0xc0
2 FLTMGR.SYS FltpFastIoWrite + 0x165
3 ntoskrnl.exe NtWriteFile + 0x43d
4 ntoskrnl.exe KiSystemServiceCopyEnd + 0x28
5 ntdll.dll NtWriteFile + 0x14
6 KernelBase.dll WriteFile + 0x76
7 ucrtbase.dll write_binary_nolock + 0x52
8 ucrtbase.dll _write_nolock + 0xb2
9 ucrtbase.dll chsize_nolock + 0xb0
10 ucrtbase.dll __crt_seh_guarded_call<int>::operator()<<lambda_63911f43cc86614963252b1f423aad40>,<lambda_522ac1b1e1a1180d6f374a882b81d1c8> &,<lambda_6da5eaf298b4749753c721cc8c61e051> > + 0x6f
11 ucrtbase.dll chsize_s + 0xa4
12 ucrtbase.dll chsize + 0xc
13 test.exe wmain + 0x48, D:\test\main.cpp(254)
14 test.exe __scrt_common_main_seh + 0x10c, d:\agent\_work\4\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(281)
So this behavior matches with (4), not (1).
I wasn't aware of the backfill issue, that's really interesting. So it seems like there are a few different ways of going about extending a file on Windows:
- SetEndOfFile. Will incur a backfill if you write non-sequentially. What Windows copy does.
- Make the file sparse with FSCTL_SET_SPARSE/FSCTL_SET_ZERO_DATA. No problem with backfill, but like option 1, no blocks are allocated for the file initially. For, e.g., SMB servers, also depends on the underlying storage supporting these FSCTLs.
- SetFileValidData - more analogous to POSIX fallocate, allocates space for the file but does not require zeroing or transferring data. Also depends on the storage supporting certain control codes.
- Actually just write out the full file size worth of data. The most portable option. :-)
@extratype That's a fair point - would you like to open a separate issue for that?
I'm using fio to try to benchmark high-throughput, pipelined writes to networked storage on Windows, and I'm running into a slightly annoying issue - in order to actually get good performance, I have to pass
--overwrite=1
-- which has a couple of small problems:On Windows, all appends get synchronized; this is true even when using unbuffered, overlapped IO. The way to get around this issue is to set the length of the file before issuing IO, generally by using the Win32 SetEndOfFile function. (Note the MS documentation I linked says to use SetFileValidData; I suspect this may be an error; using SetEndOfFile to create a sparse file seems to be the "right" way to do this, and this is what Windows Explorer does when copying a file.)
So, the behavior I want is for fio to extend the file sparsely before doing the write workload. I read
extend_file
(filesetup.c), and it does not appear to support this behavior right now - any time it truncates to extend the size of the file, it appears it will also write data to the entire file.I'd like to craft a small PR to support this, but I figured I'd float the idea first since there are a couple different ways one might do it. My first thought was to just add a flag to the thread options, something like
--extend-only-truncates
(perhaps with a better name), which just changes the behavior ofextend_file()
to do the truncate but not the writes. On the other hand, maybe this could be something specific to the windowsaio engine, since that's the only place that I'm aware of where you'll see this behavior.Any thoughts? Is there perhaps a better way to achieve this?