dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.56k stars 4.54k forks source link

Directory.GetDirectories(path) or System.IO library not working correctly in .netcore #49803

Open shubhamjainy opened 3 years ago

shubhamjainy commented 3 years ago

Directory.GetDirectories(path) or System.IO library not working correctly in .netcore

Description

We mapped a SharePoint drive in windows 10 and try to enumerate its files/folders in c#. when trying in .netcore(3.1) project with Directory.GetDirectories(path) method then the files/folders name are not the expected one. Every file/folder path name is appended with '\0' at the end. Also, 2 extra entries with '.\0' and '..\0' exists. Please refer to the attached screenshots with the output path.

We also tried to check the same behavior with .net framework(4.7.2) project and seems that everything is working fine. Please refer to the attached screenshots with the output path for this also.

The expected behavior should be that System.IO should work correctly for .netcore also, as it is working for .net framework.

Configuration

.Netcore 3.1 .Net Framework 4.7.2 Windows 10 x64/x86 Mapped SharePoint to the system

This is tested and working fine in .net framework 4.7.2

Attaching the screenshots for both .netcore and .net framework project as well as sample projects where you can test this behavior.

Please let us know if we are doing something wrong or there is any other library that exists for this. Thank you.

netcore

framework

projects.zip

danmoseley commented 3 years ago

can you please try .NET 5.0 and let us know whether you get the same result?

shubhamjainy commented 3 years ago

Yes, I have tested on .Net 5.0 and the behavior is the same as with the .Netcore. The problem still persists in .Net 5.0.

Is there any alternate way of enumerating these files/folder? We are blocked with this issue and any help will be highly appreciable.

danmoseley commented 3 years ago

@shubhamjainy while this is clearly a bug, can you not remove the \0 yourself? Just trim it off the ends?

shubhamjainy commented 3 years ago

@danmoseley yaa that we can handle.

but the major part of the problem is: 1) getting two extra entries '.\0' and '..\0' for every directory/sub-directory. 2) we keep getting incorrect sub-directories even they are not actually present. To understand this point more accurately, I request you to set up the required environment and run the above-attached sample project(.netcore). You will observe that the directory and file count is much higher than the actual.

Clockwork-Muse commented 3 years ago

The last two points should be worked around by using one of the overloads; GetDirectories(directory, search, enumeration options), and setting ReturnSpecialDirectories to false.

shubhamjainy commented 3 years ago

We tried this too but this also does not work. Is there any other workaround we could try? Also, are we going to fix this issue in the coming patches?

carlossanlop commented 3 years ago

This is expected behavior. It's an intentional breaking change that was introduced in .NET Core 2.1. The key changes are described in this document.

Among the Key behavior changes, we do this:

We only check for embedded nulls, no other chars are rejected, including wildcards (as nulls are never supported and OS APIs almost universally take null terminated strings)

And the FAQ explains what to do if a null character is found in a path:

What if I still want to check invalid characters? You can do this manually using GetInvalidPathChars(). It isn't recommended as it isn't always correct on any platform. You may have NTFS/FAT volumes mounted in Unix or vice-versa.

Here is the code that DirectoryInfo.EnumerateFiles calls to normalize the specified path. It deliberately throws when a null character is found:

https://github.com/dotnet/runtime/blob/79ae74f5ca5c8a6fe3a48935e85bd7374959c570/src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEnumerableFactory.cs#L42-L46

Dan's suggestion to manually remove the null character is the best way to address this issue.

If you want an alternative workaround, you could use FileSystemEnumerable, which has great flexibility to add your own filters:

var enumeration = new FileSystemEnumerable<string>(
    directory: @"\\Server\share\folder",
    transform: (ref FileSystemEntry entry) => entry.ToFullPath().Replace("\0", ""),
    options: new EnumerationOptions()
    {
        IgnoreInaccessible = true,
        RecurseSubdirectories = true,
        ReturnSpecialDirectories = false
    })
{
    ShouldIncludePredicate = (ref FileSystemEntry entry) => !entry.IsDirectory && (entry.FileName.EndsWith(".xls") || entry.FileName.EndsWith(".xlsx"))
};

foreach (string filePath in enumeration)
{
    Console.WriteLine(filePath);
}

I'll close this issue since it's expected behavior. Let us know if you have any additional questions.

jkotas commented 3 years ago

@carlossanlop This behavior is not expected. I do not think that it can be justified by the .NET Core 2.1 breaking change.

It is either a bug in how we are parsing the bulk file enumeration responses (it won't be the first one) or it is a bug in the Sharepoint file system driver. Could you please look into that?

jkotas commented 3 years ago

getting two extra entries '.\0' and '..\0' for every directory/sub-directory.

This in particular is an indication that it is a bug.

adamsitnik commented 2 years ago

@jkotas another customer has hit this problem in https://github.com/dotnet/runtime/issues/62429. Do you think that it would meet the servicing requirements to backport the fix (once we have it)?

lscorcia commented 2 years ago

I tried source-stepping in the .net code and it seems like the error comes directly from the file system enumeration process:

immagine

Even if the file name is exactly one character long, the FileNameLength property returns the value 4. Looking at the Interop.FILE_FULL_DIR_INFORMATION class, the FileName is actually computed from "the initial byte returned by Win32 API + (the reported length of the string / sizeof(char))":

https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/Common/src/Interop/Windows/NtDll/Interop.FILE_FULL_DIR_INFORMATION.cs#L56-L57

MSDN for the FILE_FULL_DIR_INFORMATION struct (https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/ns-ntifs-_file_full_dir_information ) does not specify whether the length is expressed in bytes or in characters. Unicode "." is 00 2E 00 00, so 4 bytes long. This seems to imply that FileNameLength is expressed in bytes and that it includes the trailing \0 - this means that the FileName ReadOnlySpan is initialized with value (4 / sizeof(char)) = 2.

This inclusion of the null terminator seems weird to me. Win32 APIs that explicitly return a string length generally don't return additional null chars at the end of the buffer. Also, I verified that it's not coming from the XML WebDav response (actually the '.' and '..' folders do not show up in there at all).

But then, why is it working on framework 4.8? Surely the Win32 API is returning the same values even on full framework - IF it is even using the same API. Unfortunately, debugging the .net framework does not lead me anywhere - the code is optimized and I can't step into it.

Maybe the actual root issue could be a bug in the WebDav client stack (i.e. the extra null byte should not be returned)? This would explain why the issue does not show up when enumerating local folders, and maybe the consequences are not visibile in fw 4.8 because it unmarshals buffers to strings instead of Spans? Maybe unmarshalling to string adds some extra processing related to null terminators?

jkotas commented 2 years ago

Do you think that it would meet the servicing requirements to backport the fix (once we have it)?

We need to understand the root cause of the problem and the fix first.

why is it working on framework 4.8?

.NET Framework 4.8 used different slower file enumeration API.

Maybe the actual root issue could be a bug in the WebDav client stack (i.e. the extra null byte should not be returned)?

Yes, it may be the case.

lscorcia commented 2 years ago

Hello @jkotas, any update on this? Could you ping someone on the Windows team to verify if the issue is in the WebDav client stack?

jkotas commented 2 years ago

I have sent email to the Windows team. They asked for a support ticket to be opened against them: "Customer needs to open up a case with CSS Windows team and we go from there."

Could you please open a ticket using https://support.microsoft.com/ and share the link here?

lscorcia commented 2 years ago

Well, I tried and got bounced as it requires a paid support contract (?):

immagine

Ticket number was 1040839458 if you can refer to it anyway.

LLUDD commented 8 months ago

Any updates?

comesx4 commented 5 days ago

Any updates?