dokan-dev / dokan-dotnet

Dokan DotNet Wrapper
http://dokan-dev.github.io
MIT License
462 stars 116 forks source link

DokanNetMirror does not return "." (self) and ".." (parent) directories in `FindFiles` #343

Closed lostmsu closed 1 year ago

lostmsu commented 1 year ago

TL;DR;

I launch DotNetMirror to mirror Original into Mirror folder. It shows in Explorer as a mount, I can open it.

Launch a 3rd party game with its data path set to a subfolder of Mirror folder.

By the time the game finishes loading, the mount disappears together with the NTFS directory where it was mounted to. Later game as part of its functioning recreates the directory as a regular NTFS directory (I suspect it just sees mount point gone).

A potential clue from the game is

22.750 Warning FileUtil.cpp:447: Permission denied trying to remove C:\Users\lost\Projects\Borg\Debug\DummySaves\Factorio\temp...lock. Will set write permissions and try again 22.844 Error FileUtil.cpp:469: Couldn't remove C:\Users\lost\Projects\Borg\Debug\DummySaves\Factorio\temp...lock: The process cannot access the file because it is being used by another process.

Repro steps

  1. Download and install Factorio Demo. Full game should also work.
  2. Create a directory Original anywhere on existing NTFS volume.
  3. Copy this file to Original: config.txt
  4. Rename it to config.ini
  5. Edit the file and change write-data= to point to the full path to Mirror\UserData (might need to precreate Original\UserData)
  6. Launch DokanNetMirror to mirror Original into Mirror (I did NOT use unsafe)
  7. Check that Mirror has config.ini with write-data= pointing to C:\Path\To\Mirror\UserData
  8. Launch factorio.exe --config C:\Path\To\Mirror\config.ini
  9. Wait and watch the contents of Mirror directory

What happens

As mentioned above, at some point the mount just disappears, and the only clue is the error from the game console output.

The game will shortly recreate it as a regular directory, but the Original will no longer be updated.

DokanNetMirror continues to work at this point, and no errors are displayed. Actually, there are no more logs from it from this point on. The log just stops until I Ctrl+C it at which point it prints Success and quits as if everything is OK.

This one is from DokanNetMirror when I do above steps (instead of UserData I use Factorio but that should not matter): mirror.log

Liryna commented 1 year ago

Hi @lostmsu ,

are you able to reproduce with the native C mirror ? Usually when the mount disappears it means a userland request was taking too long to process a request (did not response) and the kernel force unmount the drive to avoid possibly hanging the system too. But in that case you should also see the mirror unmounting and not thinking it is still alive which is very weird.

lostmsu commented 1 year ago

Hi @Liryna , just tried with the native mirror, and it does not reproduce.

Liryna commented 1 year ago

Interesting, are you able to attach a debugger and see the running threads just after the device unmount ? It would be great if it shows the hanging thread / request.

lostmsu commented 1 year ago

@Liryna it does not look like timeout. In the game log, the permission denied message appears slightly over 1 second after a message about successful initialization.

Unfortunately I don't have game source code, so I can not pinpoint the moment when the game hits the error.

lostmsu commented 1 year ago

Even funnier thing. Turns out when I close the game after the mount already disappeared from Explorer, the game is still able to write log file to Original through Mirror I am guessing via previously open handle.

lostmsu commented 1 year ago

I just thought to look into Event Viewer/System and found 2 entries around the time of disappearance from dokan2 saying:

Warning: no matching IRPs found for reply.

And then nothing until I initiated mirror shutdown with Ctrl+C, and there does not seem to be any errors or warnings.

lostmsu commented 1 year ago

FYI I originally experienced similar behavior with another IDokanOperations implementation, but switched to reproducing in Mirror as that's what other issues requested.

lostmsu commented 1 year ago

Is there something I could do short of debugging the driver? Perhaps any guesses, like an unhandled exception?

Liryna commented 1 year ago

By using Procmon you could see the status returned to the system for all the requests and find the one returned with a timeout. Then the next step would be to trace it on user land and see what it is happening

lostmsu commented 1 year ago

Still investigating, but heads up.

This is from native mirror (that works): image

This is from dotnet one (that does not): image

Liryna commented 1 year ago

The insufficient resources is most likely the query that timeout on dotnet side. Should find out if it is received and why not processed/ returned with a status in time.

lostmsu commented 1 year ago

I am debugging it, and I see that the driver returns that error very quickly.

Here's what I did:

I opened the DokanNet.sln, and wrapped the contents of ZwCreateFileProxy in Stopwatch. E.g.

log("enter ZwCreateFileProxy");
var time = Stopwatch.StartNew();

try
{
  ... // the original try-catch block
}
finally
{
  time.Stop();
  logger.Debug("CreateFileProxy : {0} Time : {1}ms",
               rawFileName, time.ElapsedMilliseconds);
}

Then run the game as described above. The path (directory) of interest is "currently-playing-background" as seen in the screenshot.

All the calls for that path get a response (e.g. finally block is called every time), the result is Success, and the maximum time value I see for that directory is 17ms, and yet according to Procmon the call to create the directory fails with INSUFFICIENT RESOURCES like mentioned above.

lostmsu commented 1 year ago

Also, the directory in question as well as couple of others gets a call to SetDispositionInformationEx(FILE_DISPOSITION_POSIX_SEMANTICS | FILE_DISPOSITION_DELETE).

lostmsu commented 1 year ago

OK, I figured it out. The issue has nothing to do with the driver, hangs, or instability. Factorio actually deletes the mount directory for some reason when it encounters a certain error and that causes Windows to detach the virtual FS.

Now the error has to do with magic "." and ".." entries in FindFiles that native mirror returns, and .NET mirror does not.

I feel like either .NET version needs to make a breaking change and add them automatically for non-root directories, or DokanOperationProxy should check the list of returned files for presence of "." and ".." when path is not \ and if they don't appear, make some loud warning so that developers would see that their implementation is incorrect.

lostmsu commented 1 year ago

Looks like https://github.com/dokan-dev/dokan-dotnet/commit/7fa8ca4dd709d93155ac0915aefbd7977fbd413e got lost at some point so https://github.com/dokan-dev/dokany/issues/131 regressed.

Liryna commented 1 year ago

Nice catch! It was moved to the native library to avoid having all the wrappers introducing the samelogic https://github.com/dokan-dev/dokany/blob/c4014d846c4092735afff21b1fd8eefc04d44ef8/dokan/directory.c#L465 Maybe the issue is in C# mirror that adds them for the root ? https://github.com/dokan-dev/dokany/blob/c4014d846c4092735afff21b1fd8eefc04d44ef8/samples/dokan_mirror/mirror.c#L824-L831

Does EnumerateFileSystemInfos includes them ? If yes, we should remove them like native mirror does https://github.com/dokan-dev/dokan-dotnet/blob/137dbdd6162286b31f89eaa949fdcc2c5d82be26/sample/DokanNetMirror/Mirror.cs#L727

lostmsu commented 1 year ago

No, EnumerateFileSystemInfos does not return the magics.

Is it possible that the native library bundled with .NET lib is old and does not include these items? Another thought that I had after a quick skim is maybe that code path in the native library is not actually getting invoked in certain scenarios: it seems suspicious to me, that Procmon shows Result NO MORE FILES and the only call to AddMissingCurrentAndParentFolder is guarded by if (Status == STATUS_SUCCESS). Could also be a patterns thing, list order, or something weird along the lines.

AFAIK native mirror returns these magics because it uses FindFiles.

If the native code you mentioned would be working, then my explicit addition of . and .. would be a no-op, however it does resolve the issue with Factorio.

Liryna commented 1 year ago

If the native code you mentioned would be working, then my explicit addition of . and .. would be a no-op, however it does resolve the issue with Factorio.

And thanks for finding the source of the issue and the solution!

It is true that the Win32 can return NO_MORE_FILES but in our case as we don't have a buffer limit, we can return the whole listing without having to split them by batch. Therefore an implementation returning NO_MORE_FILES is invalid for me and this is not the case as C# Mirror only return success.

https://github.com/dokan-dev/dokan-dotnet/blob/master/sample/DokanNetMirror/Mirror.cs#L408 https://github.com/dokan-dev/dokan-dotnet/blob/master/sample/DokanNetMirror/Mirror.cs#L747

I believe what is wrong is that the . and .. are append at the end of the listing which could confuse the app as they are normally always at front. (This can be reproduced with cmd.exe dir, the . .. are currently last). https://github.com/dokan-dev/dokany/commit/b41d0226c8137dc4051c31e9f54323c59ff5bc82

Are you cappable to just rebuild the native library and use it in C# to confirm it fixed the problem ? Otherwise I believe you can reproduce the issue with the native sample by changing https://github.com/dokan-dev/dokany/blob/c4014d846c4092735afff21b1fd8eefc04d44ef8/samples/dokan_mirror/mirror.c#L825 to BOOLEAN rootFolder = TRUE;

lostmsu commented 1 year ago

BOOLEAN rootFolder = TRUE; breaks it in native mirror, and https://github.com/dokan-dev/dokany/commit/b41d0226c8137dc4051c31e9f54323c59ff5bc82 then fixes it

Liryna commented 1 year ago

Awesome, Thanks for the confirmation! 🏆 I believe this can be closed now and waiting for a next release. Let me know otherwise