Open Quuxplusone opened 8 years ago
Attached file_27386.txt
(4987 bytes, text/plain): The MSBuild output of two tests failing because of this bug. There's a third failure (tools/llvm-symbolizer/ppc64.test), unrelated to this bug.
Attached llvm_poor_file_deletion.PNG
(54324 bytes, image/png): By the time I check the folder, it is empty. Windows "marks" files for deletion, and then later actually unlinks/deletes them. Here, the contents of this directory weren't actually deleted by the time that a unittest tried to delete the directory.
Note that the function with the bug is in ./llvm/lib/Support/Windows/Path.inc, NOT in ./llvm/lib/Support/Unix/Path.inc
Sidenote: maybe "System Libraries" is the proper component?
Attached file_27386.txt
(908 bytes, text/plain): This is the source code of llvm::sys::fs::remove, for your convenience.
Maybe bash rm on Windows has the same problem?
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130437.html
(In reply to comment #4)
> Maybe bash rm on Windows has the same problem?
>
> http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130437.html
Heh, I wouldn't be surprised. I'm a bit rusty on my shell scripting - what do
the changed lines in that commit do?
(In reply to comment #4)
> Maybe bash rm on Windows has the same problem?
>
> http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130437.html
Heh, I wouldn't be surprised. I'm a bit rusty on my shell scripting - what do
the changed lines in that commit do?
The test had similar problem to what you describe: a file deleted by rm still existed when the next command is run.
At first some workarounds were tried to avoid removing the file and immediately using it but it still often failed and was disabled on Windows:
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20150608/130450.html
The rm command may have the same problem you found in llvm::sys::fs:remove and the other programs mentioned.
So, is this a bug in remove or in the users?
Should we just change ProgramTest.cpp to not use a directory and document the restriction on remove? Anyone that really needs to delete directories then should use rename + remove.
If I understand correctly, the problem happens even when erasing one file.
When the system is loaded Windows may defer the actual file deletion.
This may lead to all sorts of problem beside erasing the containing directory
since remove() user expects the file be gone upon return.
Just below Windows\Path.inc remove(), rename() waits in a loop (with sleep of
1ms):
// Retry while we see recoverable errors.
// System scanners (eg. indexer) might open the source file when it is written
// and closed.
bool TryReplace = true;
for (int i = 0; i < 2000; i++) {
if (i > 0)
::Sleep(1);
if (TryReplace) {
...
so remove() may do the same, waiting for the file/path to be actually deleted
before return. Or it can do the rename/delete combo which makes more sense to
avoid the wait loop.
(In reply to comment #8)
> So, is this a bug in remove or in the users?
>
> Should we just change ProgramTest.cpp to not use a directory and document
> the restriction on remove? Anyone that really needs to delete directories
> then should use rename + remove.
I think I'll try and write a patch to fix this (in llvm::sys::fs::remove). It's
better for the function to just do the right thing, because this issue is
subtle enough to begin with that it's unlikely someone will get it right in
their own implementation.
So here's what I'm thinking about the patch:
It'll use the same basic algorithm that Niall Douglas used in Boost AFIO v1
(see:
https://github.com/BoostGSoC13/boost.afio/blob/b35ed5b9783a4dfde9b3d611dfb940739f303c97/include/boost/afio/v2/detail/impl/afio_iocp.ipp#L744),
except using the Win32 API instead of the NT kernel/native API.
(separating into two comments so that my text doesn't evaporate)
AFIO:
1. Opens the file with NtCreateFile (with ShareAccess set to FILE_SHARE_DELETE,
CreateOptions set to FILE_DELETE_ON_CLOSE).
2. then (in random_string) uses the NT kernel/native API RtlGenRandom to
generate a cryptographically random bytebuffer(in random_fill)
3. converts the bytebuffer to a std::string (in to_hex_string)
4. renames the file to the random name with NtSetInformationFile (with
FileInformationClass set to FileRenameInformation)
5. deletes the file by calling NtSetInformationFile with
FileDispositionInformation (where FILE_DISPOSITION_INFORMATION.DeleteFile is
set to true)
(part 2, what I think the patch will do)
Since LLVM doesn't use the NT kernel/native API, we'll need to do it a little
bit differently.
The biggest difference is that we will use CryptGenRandom, the Win32 public
facing API for generating random bytebuffers - internally it calls RtlGenRandom
- and it's already used in LLVM (see ./llvm/lib/Support/Windows/Process.inc
function Process::GetRandomNumber).
We will:
1. Open the file with CreateFileW (with dwShareMode set to FILE_SHARE_DELETE,
dwFlagsAndAttributes set to FILE_FLAG_DELETE_ON_CLOSE)
2. Create a 64 character SmallVector<char>
3. Get a handle to a crypto provider with CryptAcquireContextW, store it in a
ScopedCryptContext, and pass that to CryptGenRandom to fill the SmallVector
with random data
4. Rename the file with MoveFileW, passing the SmallVector.data() casted to a
PCWSTR (??)
5. Delete the file with DeleteFileW (not sure why AFIO explicitly deletes the
file, after opening it with FILE_DELETE_ON_CLOSE, I'm looking into this)
6. Close the file handle.
Questions remaining:
1. Should we use llvm::sys::fs::rename instead of MoveFileW?
2. Should we handle paths longer than "MAX_PATH"? (Which is totally supported,
with a funky syntax)
3. What's going on in AFIO's to_hex_string? All the bit shifting is giving me a
headache. I'm going to ask Niall, the docs say that the bitshifting is part of
packing bytes into the string, but I'm not sure why the bitshifting is
necessary?
4. Do we need to call RemoveDirectoryW to delete an empty directory, or will
DeleteFile do?
Minor comments, the code needs to keep the UTF16 (widenPath) support, see stuff at end of WindowsSupport.h and maybe sys::fs::createTemporaryFile (or the lower level createUniqueEntity) could be reused to create the temporary filename.
Maybe I'm missing something, but how about just using the Windows function
whose entire purpose is to recursively remove a directory and all the files in
it?
SHFILEOPSTRUCT FileOp = {0};
FileOp.hwnd = nullptr;
FileOp.wFunc = FO_DELETE;
FileOp.pFrom = <path of directory to delete with double null termination>;
FileOp.pTo = nullptr;
FileOp.fFlags = FOF_NO_UI;
SHFileOperation(&FileOp);
If we don't care about supporting pre-Vista, then the "recommended" way to do this is through the IFileOperation interface, btw.
Note this is the same code explorer uses when you select a directory and hit Delete, so it definitely is robust under system load.
(In reply to comment #15)
> If we don't care about supporting pre-Vista, then the "recommended" way to
> do this is through the IFileOperation interface, btw.
We don't care about supporting pre-Vista. We only care about Windows 7 and
later (see WindowsSupport.h:30).
(In reply to comment #14)
> Maybe I'm missing something, but how about just using the Windows function
> whose entire purpose is to recursively remove a directory and all the files
> in it?
>
> SHFILEOPSTRUCT FileOp = {0};
> FileOp.hwnd = nullptr;
> FileOp.wFunc = FO_DELETE;
> FileOp.pFrom = <path of directory to delete with double null termination>;
> FileOp.pTo = nullptr;
> FileOp.fFlags = FOF_NO_UI;
> SHFileOperation(&FileOp);
Mainly, It's horrifically slow. I'm also not confident that it's robust just
because explorer uses it... I've had issues deleting directory trees in
explorer under heavy load in the past.
Do you have benchmarks comparing its performance to the existing implementation? I have had probelms deleting from explorer too, but the problems I've had have not been of a racy nature. I've never experienced a problem where if I try to delete something N times it fails the first N-1 times and succeeds the N'th time. The problems have always been "this file appears to be locked for no reason and there's nothing I can to delete it".
Given that IFileOperation is basically the workhorse of file deletion and is what you get every time you press Delete in windows explorer, I have to imagine that Microsoft has tried very hard to optimize it to the best of their ability, and that if it's slower than some other hand-rolled mechanism, there is a good reason for it. Otherwise why wouldn't they use the faster mechanism?
I'm not saying it's impossible they have a poor implementation of IFileOperation, but I think it would be worth doing due diligence and producing some benchmarks to solidify this belief. There's point committing complicated code if simple code will do the job.
(In reply to comment #18)
> Do you have benchmarks comparing its performance to the existing
> implementation?
Actually, no. Right there, that's enough of a reason for me to agree.
> Given that IFileOperation is basically the workhorse of file deletion and is
> what you get every time you press Delete in windows explorer, I have to
> imagine that Microsoft has tried very hard to optimize it to the best of
> their ability, and that if it's slower than some other hand-rolled
> mechanism, there is a good reason for it. Otherwise why wouldn't they use
> the faster mechanism?
>
> I'm not saying it's impossible they have a poor implementation of
> IFileOperation, but I think it would be worth doing due diligence and
> producing some benchmarks to solidify this belief. There's point committing
> complicated code if simple code will do the job.
file_27386.txt
(4987 bytes, text/plain)llvm_poor_file_deletion.PNG
(54324 bytes, image/png)file_27386.txt
(908 bytes, text/plain)