Open ForNeVeR opened 6 months ago
Since @Kataane asked a question about the "file-system-aware" comparer in #84, I decided to elaborate on it here.
You see, in the real world, there is no such thing as a "case-sensitive operating system". There is a "case-sensitive path", or a "subtree", if you will. So, in the harsh reality, each path on the disk has its own comparison rules!
On Windows, you can control this on per-path basis using fsutil file setCaseSensitiveInfo
, see details here.
On macOS there are some other crazy ways to switch this, and on Linux, this is obviously at least a per-mount point thing (as most common drivers try to support Windows case-insensitivity natively).
The third path comparer would request this information from the actual file systems that are inspected, during path comparison, and use it when needed.
In particular, let's imagine this scenario: you are on Windows, and have the following directory structure:
C:\ [case-insensitive, default]
C:\Path [case-sensitive]
C:\Path\Subpath [case-sensitive]
C:\Path\Subpath\Insensitive [case-insensitive, say it was manually restored after creating this dir]
And our comparer is asked a question: are paths C:\Path\SubPath\Insensitive
and C:\Path\Subpath\Insensitive
equal or not?
I imagine it should work like this:
C:\
: equal in both paths, goodPath
: equal in both still goodSubpath
vs SubPath
: not equal, investigation requiredC:\Path\
false
: paths are differentSo, as the result of comparing paths C:\Path\SubPath\Insensitive
and C:\Path\Subpath\Insensitive
, we get the result false
, and the cache (that might be kept per comparer instance for now) gets information about C:\Path\
(that its children are stored in a case-sensitive way).
Obviously, this will require quite a lot of work from us, and it will be quite slow in practice (magnitudes slower than the default comparers). But I believe it is a "must have" feature of a file system path library.
I suggest the following changes.
Introduce three different path comparator kinds.
StrictStringPathComparer
?).[ ] File-system-aware comparer: for each compared path component, should compare the actual case sensitivity of the corresponding file system subroot. For non-existent paths, it should use the platform-dependent policy of calculating the case sensitivity for new subdirectories (is it normally taken from the parent directory?).
This one is obviously IO-intensive, so I'm thinking of introducing some sort of "sensitivity cache" that'd store the lists of checked paths and subtrees in a trie data structure, and would be used for one or multiple operations (probably one per comparer instance, with the ability of manual reset).