FUSE API possible race condition upon file closing (and re-opening)

dokan-dev / dokany

User mode file system library for windows with FUSE Wrapper

5.21k stars 661 forks source link

Check List

[x] I checked my issue doesn't exist yet

[?] My issue is valid with mirror default sample and not specific to my user-mode driver implementation -- only a FUSE issue

[x] I can always reproduce the issue with the provided description below.

[x] I have updated Dokany to the latest version and have reboot my computer after.

[x] I tested one of the last snapshot from appveyor CI

Description

I'm experiencing an intermittent issue when running the CrystalDiskMark benchmark software on encfs4win, where sometimes the write portion of the test will fail (and sometimes crash encfs4win). I think it might be a problem for the order that cleanup/close and createfile are happening.

It looks like the benchmark software does a Read -> Close -> Re-open -> Read, etc.

Failure:

###Read 0133 ReadFile: \CrystalDiskMark006A6F99\CrystalDiskMark006A6F9A.tmp from 0 len 32768 ###Read 0133 ReadFile: \CrystalDiskMark006A6F99\CrystalDiskMark006A6F9A.tmp from 52396032 len 32768 ###Read 0133 ReadFile: \CrystalDiskMark006A6F99\CrystalDiskMark006A6F9A.tmp from 61440 len 32768 ###Cleanup 0133 Cleanup: \CrystalDiskMark006A6F99\CrystalDiskMark006A6F9A.tmp ###Create 0135 CreateFile: \CrystalDiskMark006A6F99\CrystalDiskMark006A6F9A.tmp DesiredAccess: FILE_GENERIC_WRITE|FILE_GENERIC_READ ShareAccess: 0x0 Disposition: FILE_OPEN (1) Attributes: 128 (0x80) Options: 108 (0x6c) ###Close 0133 Close: \CrystalDiskMark006A6F99\CrystalDiskMark006A6F9A.tmp CreateFile status = c0000022

Success:

###Read 0079 ReadFile: \CrystalDiskMark00523388\CrystalDiskMark0052338B.tmp from 49283072 len 1048576 ###Read 0079 ReadFile: \CrystalDiskMark00523388\CrystalDiskMark0052338B.tmp from 50331648 len 1048576 ###Read 0079 ReadFile: \CrystalDiskMark00523388\CrystalDiskMark0052338B.tmp from 51380224 len 1048576 ###Cleanup 0079 Cleanup: \CrystalDiskMark00523388\CrystalDiskMark0052338B.tmp ###Close 0079 Close: \CrystalDiskMark00523388\CrystalDiskMark0052338B.tmp 2016-11-15 01:05:20,875 VER [RawFileIO.cpp:116] open call, requestWrite = 0 2016-11-15 01:05:20,875 VER [RawFileIO.cpp:122] using existing file descriptor ###Create 0081 CreateFile: \CrystalDiskMark00523388\CrystalDiskMark0052338B.tmp DesiredAccess: FILE_GENERIC_WRITE|FILE_GENERIC_READ ShareAccess: 0x0 Disposition: FILE_OPEN (1) Attributes: 128 (0x80) Options: 108 (0x6c)

Since the problem happens intermittently, that makes me think it is a race condition. You can see that when it fails, the order is "CleanUp -> CreateFile -> Close" and when it's a success the order is "CleanUp -> Close -> CreateFile".

I think the "Close" after the "CreateFile" is closing the new CreateFile (and not the old) since the second CreateFile is being called before "CleanUp -> Close" has completed. Do you think this is a possibility?

This (or something very similar) also happens when running CrystalDiskMark on a CryFS filesystem. CryFS uses dokany-fuse. Steps to repro:

Mount a CryFS filesystem:
```
cryfs.exe basedir J: -f -o debug
```
(note that you need to use CryFS from this commit or any release after 0.10.2 (if released by the time you're trying to reproduce), otherwise you'll run into another issue that was CryFS's fault).
Run CrystalDiskMark on it. I used CrystalDiskMark 5.5.0 x64 with the settings: 1 iteration, 1GiB file size, on drive J:, then click the "4K" button in the last row to start the benchmark.
Observe the file system crash after the benchmark completed.

Further info:

I built CryFS against DokanY 1.2.2-1000.
CryFS logs indicate that CrystalDiskMark flushes, then closes and then deletes its benchmark file from the file system, but if the benchmark file is too large (that's why we selcted 1GiB above), this seems to cause a race condition. It crashes inside the DokanY library, not in CryFS code. The log actually shows that the operation in CryFS are executed sequentially, i.e. close/release returns before delete/unlink starts. But the stack trace shows it fails somewhere in DokanY in a fileClose call where the file info seems to be invalid. For log contents and stacktrace, see below.
When running CryFS with -- -s (which forwards the -s to DokanY), i.e. running it in single threaded mode, then the error disappears. So it might be a race condition. When, however, running it without -s but adding a global mutex to the CryFS code that ensures that no two CryFS file system operations can run at the same time, then the error still happens. So it seems to be a race condition in the DokanY code, not in the CryFS code.
When running CrystalDiskMark with a test file size of 10MB, then the error disappears. Probably because file deletion is too fast to cause a race condition.

Log contents:

This is what happens at the time of crash. The lines containing [cryfs] are logs from CryFS that are triggered at the start an end of each CryFS operation, i.e. right after DokanY calls into CryFS and right before it returns to DokanY.

###GetFileInfo 0018
GetFileInfo: : \CrystalDiskMark09B2EDEB\CrystalDiskMark09B2EE18.tmp
[2019-06-01 19:47:25.270] [cryfs] [debug] getattr("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp", _, _)
[2019-06-01 19:47:25.271] [cryfs] [debug] getattr("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp", _, _): success
        result =  0
        FileInternalInformation
        DispatchQueryInformation result =  0
###Cleanup 0018
Cleanup: \CrystalDiskMark09B2EDEB\CrystalDiskMark09B2EE18.tmp

[2019-06-01 19:47:25.275] [cryfs] [warning] flush("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp", _)
[2019-06-01 19:47:25.275] [cryfs] [warning] flush("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp", _): success
[2019-06-01 19:47:25.276] [cryfs] [debug] release("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp", _)
[2019-06-01 19:47:25.276] [cryfs] [debug] release("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp", _): success
[2019-06-01 19:47:25.277] [cryfs] [debug] unlink("/CrystalDiskMark09B2EDEB/CrystalDiskMark09B2EE18.tmp")
###Close 0018
###Create 0019
CreateFile: \
        DesiredAccess: SYNCHRONIZE|FILE_READ_DATA
Close: \CrystalDiskMark09B2EDEB\CrystalDiskMark09B2EE18.tmp

Stacktrace: This is the stacktrace: https://snag.gy/d5pinJ.jpg

Note that, a few frames up in the stack, the file info is invalid: https://snag.gy/cNUwuv.jpg

dokan-dev / dokany

FUSE API possible race condition upon file closing (and re-opening) #404

Environment

Check List

Description