dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.26k stars 4.73k forks source link

Proposal: Directory.GetCaseSensitivity() #34235

Open mikernet opened 4 years ago

mikernet commented 4 years ago

I would like to make the case for adding a method to check directory case-sensitivity:

  1. Windows Subsystem for Linux has directory specific case-sensitivity support now.
  2. .NET now runs cross-platform and although each OS usually has a typical default, this is not necessarily always true for all volumes.
  3. Network shares could always have different casing rules than local disks.

Having a universal cross-platform way to check if a directory is case-sensitive would help with correctly writing file system agnostic code.

Possible signatures include:

bool Directory.TryGetCaseSensitivity(string path, out bool isCaseSensitive);
DirectoryCaseSensitivity Directory.GetCaseSensitivity(string path);
bool Directory.GetCaseSensitivity(string path); // this is my preference

enum DirectoryCaseSensitivity
{
    CaseSensitive,
    CaseInsensitive,
    DirectoryNotFound,
}

This could also be an instance method on DirectoryInfo or both.

If we went with the bool Directory.GetCaseSensitivity(string path) signature and the directory does not exist then it could either return the value for the closest parent directory that does exist or throw IOException. I would prefer that it does not throw if possible and just returns the case-sensitivity of the closest parent folder.

CyrusNajmabadi commented 4 years ago

it could either return the value for the closest parent directory that does exist

Note: i don't believe this is wsl semantics (i.e. this information is not inherited, so you shouldn't/can't presume anything about a nested dir based on the parent dirs). So likely best that it throw in hte case that the directory is not there.

mikernet commented 4 years ago

If the directory doesn't exist though then when it is created it will inherit the parent directory's info by default, but I can go either way on this...I'm fine with an exception and the caller can decide what is appropriate to do in their particular situation based on what they plan to do with that info.

kevingosse commented 4 years ago

I'm curious at what use-cases this could have. If you already have the name of the folder with the correct case, you're going to use it and it'll work whether the filesystem is case sensitive or not. If you don't have it, then knowing whether the filesystem is case sensitive or not won't change anything, you still don't have the right path. Unless you're going to enumerate the files in the folder to find the right case?

mikernet commented 4 years ago

You can imagine a situation where you are building an IDE that stores relative file paths to related project files. The input paths should be tolerant of case mismatches in the event that a directory is case-insensitive since the paths can be obtained through user input.

I can get most of the way by getting the "real" name of the directory and match using that, up until I get to the last directory where the rest of the path may not actually exist. The suboptimal solution is to simply treat anything after that as only matching if it has the same case, which is what I'm doing now, but it would be better to obtain the actual case-sensitivity of that last directory and use that going forward because when those files are created (at some point later), that's what their case-sensitivity will actually be.

EDIT: Nevermind, this does not work. See this comment below.

mikernet commented 4 years ago

I don't think this should be added to the BCL just for my particular use case. I'm posting this proposal to gauge interest as I have a strong feeling that others may have their own use cases for determining directory case-sensitivity as well.

I don't know off the top of my head what others are using it for, but I read lots of threads of people discussing how to do this a long time ago when I wrote some code to attempt to do this for network shares that could have mismatched case-sensitivity to Windows OS defaults so there is at least some level of desire for this out there.

Most systems generally assume case-sensitive or case-insensitive based on the OS they are running on, which is a highly flawed approach that results in a lot of bugs, many of which now pop up when you try to use programs on the new case-sensitive directories in Windows or case-sensitive network shares. It would be nice to have a mechanism to tap into this file system info in a cross-platform way in .NET, but I'll admit that I don't know how many people would make use of it.

mikernet commented 4 years ago

P.S. I'd be happy to PR this if it would be accepted.

mikernet commented 4 years ago

Another use case that I just ran into: defaulting directory search behavior to the case sensitivity of the folder being searched. Currently the behavior of Directory.EnumerateFileSystemEntries() is based on the platform case-sensitivity sniffing code here:

https://github.com/dotnet/runtime/blob/master/src/libraries/Common/src/System/IO/PathInternal.CaseSensitivity.cs

Which, as noted:

This could return invalid results in corner cases where, for example, different file systems are mounted with differing sensitivities.

mikernet commented 4 years ago

Getting the "real" properly-cased name of an entry is not possible to do reliably without this functionality. The only way to coerce a path such as C:\some\path\to\some\file.txt to the proper case is to search for each path segment in a case-insensitive manner (i.e. Directory.EnumerateFiles(@"\C:\some\path\to\some", "file.txt", new EnumerationOptions { MatchCasing = MatchCasing.CaseInsensitive })) to see what the actual returned entry name is. Case-sensitive portions of the path should be untouched though, for obvious reasons, and there is no way to do that.

EDIT: Seems there are other proposals getting attention for resolving the real case of a path, but I still have a couple other use cases for this as indicated above.

GrabYourPitchforks commented 2 years ago

Network shares could always have different casing rules than local disks.

We'd have to remain mindful of this even if we're never leaving the local machine. NTFS (like SQL) carries its own casing information independent from the rest of the operating system. So even if you know that NTFS is operating in case-insensitive mode, the implementation of that logic will differ from the rest of the app in subtle ways.

For example:

Console.WriteLine(string.Equals("c:\Ꮿ.txt", "c:\ꮿ.txt", StringComparison.OrdinalIgnoreCase)); // prints "True"

Because the casing tables which ship with Windows - and which .NET calls into for its string manipulation routines - treat the characters 'Ꮿ' and 'ꮿ' as case-insensitive equivalents. However, NTFS's casing tables do not currently contain this relationship, so NTFS would treat these as two distinct files, even when operating in case-insensitive mode.

This impacts APIs like GetRelativePath, which currently assume that the OS casing tables and the NTFS casing tables are equivalent, leading to incorrect responses given certain inputs. Even if GetRelativePath is enlightened to query individual directories for case sensitivity flags, maintaining the assumption about how the casing tables behave could continue to give incorrect results.