dokan-dev / dokan-dotnet

Dokan DotNet Wrapper
http://dokan-dev.github.io
MIT License
458 stars 115 forks source link

`ReadFile`: semantics of `bytesRead` #352

Open lostmsu opened 7 months ago

lostmsu commented 7 months ago

Just spent a couple of hours trying to figure out an issue with my FS.

Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.

I was calling .NET Stream.Read with the buffer, and my filesystem would return garbage to some programs. The reason is Stream.Read by its contract does not have to fill your entire buffer. For example, when reading from a TCP connection if you specify 1MiB buffer and 1KiB packet arrives, NetworkStream will fill 1KiB and return 1024 even though the connection is still open and it will be possible to read more data in the future.

Now I don't know what the kernel APIs expect in this case, but IMHO one of the following should be done:

Liryna commented 7 months ago

Hi @lostmsu ,

Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.

Dokan does use bytesRead to inform the system how much was actually read. It does not expect that the buffer is always fill. That said, the calling application must handle the bytesRead it will receive and reissue a read to get the next data.

lostmsu commented 7 months ago

The scenario I worked with is dumpbin from C++ build tools, and it failed with my FS until I did fill the buffer (but didn't fail on a raw file).

There might be a bug in .NET wrapper. In the decompilation it looked like it copies the whole buffer regardless of bytesRead for some reason, which is suspect. Perhaps some programs rely on only bytesRead to be overwritten. E.g. if they read into a circular buffer.

LTRData commented 7 months ago

I assume that you have seen this line: https://github.com/dokan-dev/dokan-dotnet/blob/master/DokanNet%2FDokanOperationProxy.cs#L411

And yes, it looks like it copies the whole buffer. But the value of rawReadLegth is reported back to the OS because it is a ref parameter, so this should never cause any issues. Have you checked and made absolutely sure that you return the correct read length in the bytesRead output parameter?

lostmsu commented 7 months ago

Reasonably sure. This is the end of the method before the fix

bytesRead = stream.Read(buffer, 0, buffer.Length);
return DokanResult.Success;
LTRData commented 7 months ago

Okay. I think it would be interesting in your case to just try and change the Marshal.Copy code in the library to use the reported length instead of initial length and see if that makes any change in this case.

lostmsu commented 6 months ago

So I tried the suggested change, and dumpbin is still misbehaving when I read less than requested. Could still be bug elsewhere in dokan, but maybe it is a bug in dumpbin, where they unreasonably expect to read the exact amount requested.

This is what I put into DokanOperationProxy:

if (rawReadLength < 0)
    throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be greater than or equal to zero.");
if (rawReadLength > rawBufferLength)
    throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be less than or equal to the length of the buffer.");
Marshal.Copy(buffer, 0, rawBuffer, rawReadLength);

This is what I put into Mirror.ReadFile (+ further down replaced all buffer.Length with read):

int read = Math.Min(buffer.Length, 128);

I copied apphost.exe to the mirrored folder and tried to dumpbin /dependents mirror\apphost.exe which gave

LINK : warning LNK4094: 'apphost.exe' is an MS-DOS executable; use EXEHDR to dump it

When I replace with read = buffer.Length it works correctly.

The sad part is that if Microsoft tool is misbehaving like that, how many other apps silently rely on a similar behavior? I guess I will have to ensure full reads now in all file system implementations for compatibility purposes. 😭

LTRData commented 6 months ago

Thanks for testing! Based on your results, it seems like something we probably need to address somehow. I'll take a closer look at this next week when I am back in office.

lostmsu commented 6 months ago

BTW, if you are considering forcing the compatibility thing at Dokan level, I think simply repeating reads until buffer is filled is going to break potential scenarios where FS is backed by something like sockets.

Liryna commented 6 months ago

I have confirmed that bytesRead behave as expected. The caller will receive the value set by the FS implementation. The only thing I saw is that the library did not use this value to copy the data to the destination buffer https://github.com/dokan-dev/dokan-dotnet/commit/2f15c8a0bd2e84ace51661c2d0b52c8438547fab Not sure if this solves dumpbin case but worth a try.

Procmon might also be useful to see what is going on with dumpbin.

@lostmsu Are you able to reproduce the issue with the C# Mirror ? C Mirror / Memfs ? It would help find out where the issue is

lostmsu commented 6 months ago

@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile implementation.

lostmsu commented 6 months ago

@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead. It is nice to have foot guards 😁

LTRData commented 6 months ago

@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile implementation.

But that is not a correct implementation. If fewer than requested bytes are returned for a file by the file system, it means that the read reached end of file. It is not like reading from devices such as communication channels. So, effectively, dumpbin will obviously treat the file as just 128 bytes long if you do this (which means it will only get the MZ header and not the PE header and assume it is a DOS exe etc etc).

LTRData commented 6 months ago

@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead. It is nice to have foot guards 😁

It is a bit of an issue when an implementation actually wants to return fewer than requested bytes. That should be perfectly fine to do, to indicate that only that much data was available towards the end of the file.

lostmsu commented 6 months ago

@LTRData the guards are against negative values and values that are higher than requested.

lostmsu commented 6 months ago

From MSDN it sounds like @LTRData is correct image

E.g. a file system that serves files should always read the requested amount when possible.

IMHO, what should be done is by default Dokan should repeat ReadFile calls from IDokanOperations until the buffer is filled or the end of file is reached just to prevent accidental incorrect implementations that follow .NET Stream behavior. But there should be a way for a file system implementation to opt out of this behavior somehow.

LTRData commented 6 months ago

From MSDN it sounds like @LTRData is correct image

E.g. a file system that serves files should always read the requested amount when possible.

IMHO, what should be done is by default Dokan should repeat ReadFile calls from IDokanOperations until the buffer is filled or the end of file is reached, but there's should be a way for an FS implementation to opt out of this behavior somehow.

Sorry but I do not agree. That would limit possibilities for implementations that actually want to return fewer bytes read to indicate reached end of file and to avoid further unnecessary reads. Dokan should not send another unnecessary read request to the implementation in that case.

lostmsu commented 6 months ago

@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations, or another optional interface, or something else.

LTRData commented 6 months ago

@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations, or another optional interface, or something else.

Yes, but I think it should be the other way around in that case. There should be some kind of compatiblity layer to implement if this behavior with subsequent reads are needed by the implementation. Because this is not how file system implementations normally handle requests.

lostmsu commented 6 months ago

The problem with the "other way around" is that is just calls for bugs. BTW, the whole conversation should also apply to dokan native repository, because in C standard library read call has the same semantics as .NET Stream.Read - the only requirement is that 0 means end of file.

For instance, I have not looked at SSHFS implementation, but there's a good chance that it exhibits the same issue because it reads from the network, and network does not have the semantics used in Windows ReadFile call for files.

lostmsu commented 6 months ago

And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.

Liryna commented 6 months ago

Agree there is no good reason for the library to retry reads. It is unable to know why the file system returned this value. Retry could be bad. Even if it is an option, maybe one read case returned less bytes for a good reason and a retry could make it worse.

LTRData commented 6 months ago

And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.

No. That is not an issue. For regular files, .NET Stream reads and C library reads always return the number of requested bytes except towards end of file. The concept of returning fewer bytes and reads should be retried only applies to things like communication devices, network sockets and similar.

Edit: to be more specific, the read that you linked to in sshfs is not a read from a network socket in a way that would be an issue here. And if it was, it would have been really bad and practically would never had worked.

lostmsu commented 6 months ago

.NET Stream reads and C library reads always return the number of requested bytes

This is never guaranteed, and maybe true right now on Windows, but actually may not be.

For instance, from FileStream.Read documentation:

Returns Int32

The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.

LTRData commented 6 months ago

.NET Stream reads and C library reads always return the number of requested bytes

This is never guaranteed, and maybe true right now on Windows, but actually may not be.

For instance, from FileStream.Read documentation:

Returns Int32

The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.

It is never guaranteed as a general concept for everything that could be read from. But for regular files it definitely is, on all platforms and frameworks. There are a huge number of things in many places that would break otherwise.

lostmsu commented 6 months ago

I'm sorry, but what you say just doesn't match the documentation that I quoted.

Unlike the ReadFile documentation I quoted above, FileStream.Read doesn't make any guarantees like this. And same applies for read in C spec.

lostmsu commented 6 months ago

In fact, there's more down there:

The method is free to return fewer bytes than requested even if the end of the file stream has not been reached

LTRData commented 6 months ago

In fact, there's more down there:

The method is free to return fewer bytes than requested even if the end of the file stream has not been reached

Even Win32 ReadFile frequently returns fewer bytes in cases like network sockets, serial ports etc (the documentation for ReadFile that you quoted is wrong/incomplete in this way).

But this does not happen for regular files.

In any case, it is something that you need to take into account if you are reading from sources that could behave like this. But the code in sshfs that you linked to is not an example of such scenario because there they read from a steam buffer that is known to the implementation and the exact behavior is well defined. You can compare that to when they read from sockets and similar, which is entirely different in this aspect.