Open lostmsu opened 8 months ago
Hi @lostmsu ,
Turned out that Dokan expects the implementation of ReadFile to always fill the entire buffer when offset + buffer.Length are within the file length, and read exactly remaining bytes when offset + buffer.Length are more than file length.
Dokan does use bytesRead
to inform the system how much was actually read. It does not expect that the buffer is always fill.
That said, the calling application must handle the bytesRead
it will receive and reissue a read to get the next data.
The scenario I worked with is dumpbin
from C++ build tools, and it failed with my FS until I did fill the buffer (but didn't fail on a raw file).
There might be a bug in .NET wrapper. In the decompilation it looked like
it copies the whole buffer regardless of bytesRead
for some reason, which
is suspect. Perhaps some programs rely on only bytesRead
to be overwritten. E.g. if they read into a circular buffer.
I assume that you have seen this line: https://github.com/dokan-dev/dokan-dotnet/blob/master/DokanNet%2FDokanOperationProxy.cs#L411
And yes, it looks like it copies the whole buffer. But the value of rawReadLegth
is reported back to the OS because it is a ref
parameter, so this should never cause any issues. Have you checked and made absolutely sure that you return the correct read length in the bytesRead
output parameter?
Reasonably sure. This is the end of the method before the fix
bytesRead = stream.Read(buffer, 0, buffer.Length);
return DokanResult.Success;
Okay. I think it would be interesting in your case to just try and change the Marshal.Copy
code in the library to use the reported length instead of initial length and see if that makes any change in this case.
So I tried the suggested change, and dumpbin
is still misbehaving when I read less than requested. Could still be bug elsewhere in dokan, but maybe it is a bug in dumpbin
, where they unreasonably expect to read the exact amount requested.
This is what I put into DokanOperationProxy
:
if (rawReadLength < 0)
throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be greater than or equal to zero.");
if (rawReadLength > rawBufferLength)
throw new ArgumentOutOfRangeException("bytesRead", rawReadLength, "bytesRead must be less than or equal to the length of the buffer.");
Marshal.Copy(buffer, 0, rawBuffer, rawReadLength);
This is what I put into Mirror.ReadFile
(+ further down replaced all buffer.Length
with read
):
int read = Math.Min(buffer.Length, 128);
I copied apphost.exe to the mirrored folder and tried to dumpbin /dependents mirror\apphost.exe
which gave
LINK : warning LNK4094: 'apphost.exe' is an MS-DOS executable; use EXEHDR to dump it
When I replace with read = buffer.Length
it works correctly.
The sad part is that if Microsoft tool is misbehaving like that, how many other apps silently rely on a similar behavior? I guess I will have to ensure full reads now in all file system implementations for compatibility purposes. 😭
Thanks for testing! Based on your results, it seems like something we probably need to address somehow. I'll take a closer look at this next week when I am back in office.
BTW, if you are considering forcing the compatibility thing at Dokan level, I think simply repeating reads until buffer is filled is going to break potential scenarios where FS is backed by something like sockets.
I have confirmed that bytesRead
behave as expected. The caller will receive the value set by the FS implementation.
The only thing I saw is that the library did not use this value to copy the data to the destination buffer https://github.com/dokan-dev/dokan-dotnet/commit/2f15c8a0bd2e84ace51661c2d0b52c8438547fab Not sure if this solves dumpbin
case but worth a try.
Procmon might also be useful to see what is going on with dumpbin.
@lostmsu Are you able to reproduce the issue with the C# Mirror ? C Mirror / Memfs ? It would help find out where the issue is
@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in ReadFile
implementation.
@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by bytesRead
. It is nice to have foot guards 😁
@Liryna please see above. You can repro the issue with Mirror if you limit the amount of bytes read to 128 in
ReadFile
implementation.
But that is not a correct implementation. If fewer than requested bytes are returned for a file by the file system, it means that the read reached end of file. It is not like reading from devices such as communication channels. So, effectively, dumpbin will obviously treat the file as just 128 bytes long if you do this (which means it will only get the MZ header and not the PE header and assume it is a DOS exe etc etc).
@Liryna also, see my suggested change instead of your fix, that guards against bad number returned by
bytesRead
. It is nice to have foot guards 😁
It is a bit of an issue when an implementation actually wants to return fewer than requested bytes. That should be perfectly fine to do, to indicate that only that much data was available towards the end of the file.
@LTRData the guards are against negative values and values that are higher than requested.
From MSDN it sounds like @LTRData is correct
E.g. a file system that serves files should always read the requested amount when possible.
IMHO, what should be done is by default Dokan should repeat ReadFile
calls from IDokanOperations
until the buffer is filled or the end of file is reached just to prevent accidental incorrect implementations that follow .NET Stream
behavior. But there should be a way for a file system implementation to opt out of this behavior somehow.
From MSDN it sounds like @LTRData is correct
E.g. a file system that serves files should always read the requested amount when possible.
IMHO, what should be done is by default Dokan should repeat
ReadFile
calls fromIDokanOperations
until the buffer is filled or the end of file is reached, but there's should be a way for an FS implementation to opt out of this behavior somehow.
Sorry but I do not agree. That would limit possibilities for implementations that actually want to return fewer bytes read to indicate reached end of file and to avoid further unnecessary reads. Dokan should not send another unnecessary read request to the implementation in that case.
@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on IDokanOperations
, or another optional interface, or something else.
@LTRData that's why I propose there should be an option for an implementation to opt out of this behavior. E.g. a property on
IDokanOperations
, or another optional interface, or something else.
Yes, but I think it should be the other way around in that case. There should be some kind of compatiblity layer to implement if this behavior with subsequent reads are needed by the implementation. Because this is not how file system implementations normally handle requests.
The problem with the "other way around" is that is just calls for bugs. BTW, the whole conversation should also apply to dokan native repository, because in C standard library read
call has the same semantics as .NET Stream.Read
- the only requirement is that 0 means end of file.
For instance, I have not looked at SSHFS implementation, but there's a good chance that it exhibits the same issue because it reads from the network, and network does not have the semantics used in Windows ReadFile
call for files.
And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.
Agree there is no good reason for the library to retry reads. It is unable to know why the file system returned this value. Retry could be bad. Even if it is an option, maybe one read case returned less bytes for a good reason and a retry could make it worse.
And to prove my theory: https://github.com/dokan-dev/dokan-sshfs/blob/3577c47f92e6f67be80554a7b1a91a922e627495/DokanSSHFS/DokanOperations.cs#L518 yes, it appears to have this bug.
No. That is not an issue. For regular files, .NET Stream reads and C library reads always return the number of requested bytes except towards end of file. The concept of returning fewer bytes and reads should be retried only applies to things like communication devices, network sockets and similar.
Edit: to be more specific, the read that you linked to in sshfs is not a read from a network socket in a way that would be an issue here. And if it was, it would have been really bad and practically would never had worked.
.NET Stream reads and C library reads always return the number of requested bytes
This is never guaranteed, and maybe true right now on Windows, but actually may not be.
For instance, from FileStream.Read
documentation:
Returns Int32
The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
.NET Stream reads and C library reads always return the number of requested bytes
This is never guaranteed, and maybe true right now on Windows, but actually may not be.
For instance, from
FileStream.Read
documentation:Returns Int32
The total number of bytes read into the buffer. This can be less than the number of bytes allocated in the buffer if that many bytes are not currently available, or zero (0) if the end of the stream has been reached.
It is never guaranteed as a general concept for everything that could be read from. But for regular files it definitely is, on all platforms and frameworks. There are a huge number of things in many places that would break otherwise.
I'm sorry, but what you say just doesn't match the documentation that I quoted.
Unlike the ReadFile
documentation I quoted above, FileStream.Read
doesn't make any guarantees like this. And same applies for read
in C spec.
In fact, there's more down there:
The method is free to return fewer bytes than requested even if the end of the file stream has not been reached
In fact, there's more down there:
The method is free to return fewer bytes than requested even if the end of the file stream has not been reached
Even Win32 ReadFile
frequently returns fewer bytes in cases like network sockets, serial ports etc (the documentation for ReadFile
that you quoted is wrong/incomplete in this way).
But this does not happen for regular files.
In any case, it is something that you need to take into account if you are reading from sources that could behave like this. But the code in sshfs that you linked to is not an example of such scenario because there they read from a steam buffer that is known to the implementation and the exact behavior is well defined. You can compare that to when they read from sockets and similar, which is entirely different in this aspect.
Just spent a couple of hours trying to figure out an issue with my FS.
Turned out that Dokan expects the implementation of
ReadFile
to always fill the entire buffer whenoffset + buffer.Length
are within the file length, and read exactly remaining bytes whenoffset + buffer.Length
are more than file length.I was calling .NET
Stream.Read
with thebuffer
, and my filesystem would return garbage to some programs. The reason isStream.Read
by its contract does not have to fill your entire buffer. For example, when reading from a TCP connection if you specify 1MiB buffer and 1KiB packet arrives,NetworkStream
will fill 1KiB and return 1024 even though the connection is still open and it will be possible to read more data in the future.Now I don't know what the kernel APIs expect in this case, but IMHO one of the following should be done:
bytesRead
is not used everywhere it should be), fixing that bugReadFile
calls until eitherbuffer
is filled or end of file reached (this would remove the need for documenting the discrepancy)