Closed JeremyKuhne closed 4 years ago
OK... Now I'm massively confused. The reason I got started on this was dotnet/runtime#21678 which started from another issue https://github.com/dotnet/corefx/pull/19716#discussion_r116282222 which started because checking that file exists || directory exists is inefficient. However, looking at this code, it seems to return true for File.Exists
when that "file" is a directory.
FileSystem.FileExists
does not appear to work that way in the current codebase, nor does FileSystem.DirectoryExists
. So, what gives?
Hmm... maybe that comment is just wrong here: https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L176-L177
OK... Now I'm massively confused
Sorry :)
it seems to return true for File.Exists when that "file" is a directory
Shouldn't.
Hmm... maybe that comment is just wrong here:
Looks like it. :)
OK... walking back from the ledge now. Let's see... so.
Are you saying you recommend File.Exists
and Directory.Exists
directly use FileSystem.Current.FileExists
and FileSystem.Current.DirectoryExists
as those generally do the right thing and seem like they don't throw exceptions? I forgot, they do seem to require a full path though, right, or is that not generally true? I was wondering about the removing of the trailing \
in DirectoryExists
and whether that should really be done or not in FileSystem
. Have you seen that bit? I'd be afraid to change that at this point, obviously, as it likely affects a LOT of stuff upstream.
This is related to the other (#19717) in that checking for existence period isn't efficient. You have to check twice (#19717) when you don't care if the existing path is a file or directory, which is a pretty common scenario.
Underlying that scenario is this one- if you do care if it is a file or directory, the existing methods do way more work than they need to, notably in Windows. We pre-normalize the path, but the Windows API does that anyway when you get the attributes (which I presume is also the case for Unix). The current FileSystem impl throws, then we catch and turn it to false- which is also inefficient- allocating strings and exceptions then throwing them away. :(
The Unix impl may actually benefit from pre-normalization as it calls multiple APIs. We'd have to validate.
We don't want to change the existing FileSystem.Current.FileExists
until we've vetted that nobody cares about the exception state. Simpler to start with a new method and transition a bit at a time I think.
Trimming the trailing separator is still something we'd want to keep.
If you don't want to do this, that's fine, but I'm more than happy to help you through it if you do. I've got enough experience here that I can pretty confidently review your changes. :)
I'd be happy to tackle this one. questions...
It seems like the "common" function I want is something like a (not yet existing) FileSystemEntryType FileSystemEntry.Lookup(string path)
with
enum FileSystemEntryType { DoesNotExist, File, Directory, ...? }
that doesn't ever throw on anything but rather just returns DoesNotExist
if you don't have privileges to see it, for instance. It also seems like we'd want to use this in CoreCLR for the (not yet existing) Path.Exists
. Should I just start with building something like this internal on FileSystem
and then figure we'll move it later to CoreCLR as the time permits and api review is completed for Path.Exists
and dotnet/runtime#21678?
Also, it seems like we need to follow symlinks in this one, right? Just want to confirm.
DoesNotExist if you don't have privileges to see it, for instance.
I've always been conflicted about how to represent the concept in a way that is easy to understand. CanNotDetermine
or perhaps both NotFound
and AccessDenied
? In my old team I actually needed to know whether it was a rights issue so we could request the information from a higher-privilege process- it might be nice to have that available for a future API (perhaps the Path.Exists
, even!).
Should I just start with building something like this internal on
FileSystem
That is what I would do- much easier to iterate.
Also, it seems like we need to follow symlinks in this one, right?
Yeah, as that is the default historical (Windows) behavior.
@JeremyKuhne I've taken the first steps in addressing this one. Can you have a look over the commit referenced above (kellypleahy/corefx@1a3842c) and see if this is along the lines you were thinking? My general philosophy was to pass through all errors that were easily captured from the underlying system (for free) to the caller as enum values. The enum values that are consistent between the two OSes are reused - some can only come from one or the other OS (for example, Socket
can only happen on *nix while InvalidDrive
can only happen on Win32).
I haven't done anything to use this new function yet. I assume we'd use it in File.Exists
and Directory.Exists
as well as the places where the other ticket (#19717) recommends eliminating the double-check for File.Exists(...) || Directory.Exists(...)
.
Next steps - I don't know how best to test some of these cases. Some of the tests should be straightforward, but for some of them I need to create a path that has a folder in it that is not accessible to the user running the LookupEntry
function. I'm not sure if that's possible with the current test structure we have in corefx. Do you know if that's possible to change user during the test and verify things using that? Other cases require a valid and invalid symlink, as well as a cycle of symlinks.
cc @karelz re: testing above ^
also,... might be good to assign this one to me so nobody else grabs it ;)
Your wish was granted 😆, it's now assigned to you. (BTW: Thanks for your help!)
Re: Testing - I don't think we have abilities to switch users in test runs. It is very corner-case scenario and our infra was focused on mainline scenarios so far. I let @JeremyKuhne advice how best to go from here. Maybe having manually-run tests with special runner / prep step (which would create dirs on another account)? Maybe just one-off manual testing?
is along the lines you were thinking?
It is a good start. Some more outstanding stuff (to make sure they're in your todo list):
FillAttributeInfo
: Remove tryagain
. We should only ever make a single attempt at Interop.Kernel32.GetFileAttributesEx
, no reason to use FindFirstFile
.Path.GetFullPath()
) and replace with an embedded null check like Path.GetFullPath
does.My general philosophy was to pass through all errors that were easily captured from the underlying system
I think that is fine, as long as we don't surface them in our public API, as they don't align nicely cross-plat. Passing them back internally like this may make it easier to modify/add new API that depends on this helper. I would also make sure that .Directory
and .File
are set to match what Exists
returns- that means either merging the other FileTypes or turning this into a Flags enum. Don't want people accidentally changing the semantics of what a "file" is. I'm currently thinking merging is better than Flags.
Do you know if that's possible to change user during the test and verify things using that?
Afaik we haven't written anything to do this yet. A simpler answer might be to remove file rights (but not ownership), then restore them? We might also be able to find OS files that we can't hit? (Don't know any off the top of my head.)
From this point I'd move towards an official PR for discussion after you make more changes- you can mark it as [WIP] if we're still hashing things out.
From this point I'd move towards an official PR for discussion after you make more changes- you can mark it as [WIP] if we're still hashing things out.
OK cool. How do I "mark it as WIP"? Do I just put [WIP] in the PR title?
Do I just put [WIP] in the PR title?
yep
@kellypleahy Just a heads up- I've got to make some changes around this area for UAP work. dotnet/corefx#21149 changes the core FillAttributes- I'll drop mentions of any other changes that cross paths with this issue.
Thanks, I got a bit distracted but will be jumping back in this weekend. I can always rebase when you are done with your changes.
On Fri, Jun 16, 2017, 16:01 Jeremy Kuhne notifications@github.com wrote:
@kellypleahy https://github.com/kellypleahy Just a heads up- I've got to make some changes around this area for UAP work. dotnet/corefx#21149 https://github.com/dotnet/corefx/pull/21149 changes the core FillAttributes- I'll drop mentions of any other changes that cross paths with this issue.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/20876#issuecomment-309158174, or mute the thread https://github.com/notifications/unsubscribe-auth/AADP2NIbJZNl-Mw7Dd4n6TaVc-WTQfofks5sEwk_gaJpZM4N1rFh .
I have been very busy lately and haven't been able to do anything additional on this. I'm hoping to get back into it in a week or so. If you deem it important to get started earlier, please feel free to reassign - I don't want to delay anyone who is waiting on this.
No worries. If someone wants to grab it, they can speak up. It is in Future milestone, which means we won't look at it ourselves anytime soon.
Happy to reassign if you free up...
Costed at 3-4 days if @JeremyKuhne does it. Depends on dotnet/corefx#25539 to add Span overloads first.
I'd really love to know the perf numbers of the File.Create, File.Exists and File.Delete operations cross platform. In our app (Exceptionless), we've noticed better performance on on windows and we are trying to track it down. Currently we are looking to replace all unneeded operations in our storage provider and just catch exceptions. https://github.com/FoundatioFx/Foundatio/blob/master/src/Foundatio/Storage/FolderFileStorage.cs
@niemyjski Exists isn't horrible- you'd have to call it an awful lot to get a noticeable hit.
Directory enumeration is the most expensive API we have- a good place to focus your investigation. I recently did a big update for Windows with enumeration perf and I'm looking to provide high-performance extension points cross plat. dotnet/corefx#25873
If you get some measurements of where your app is spending time in System.IO that you think can be improved feel free to open issues and loop me in.
@Anipik this is one you could look at when you have finished other tasks and have free time later. @JeremyKuhne can give more details.
I forgot to post this on here when I was doing profiling work some time ago.
@niemyjski I made significant improvements to FillAttributeInfo. The marshaling involved now uses blittable structs and it only falls back to FindFirstFile in the needed error case. Some of the marshaling changes are in 4.7.2.
The key place the existing code is still bad is that it try/catches for non-existence. That is the first thing we should try to remove here. Additionally we can look at removing the normalization (e.g. Path.GetFullPath). The OS doesn't need us to do that work and we've removed almost all of the pre-checks from that code. If we do remove the GetFullPath() we would need to allow for the fact that the p/invokes assume normalization has occurred. That entails trimming trailing periods, spaces, and separators if the path isn't a device path. If it is a device path just the trailing separator has to be removed.
Not as big of an issue as it used to be. Closing.
Related to dotnet/runtime#21678
Exists methods have a try-catch for error cases and they normalize, which is not necessary (at least on Windows). As all error cases return false, we can remove most of this overhead.
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L181-L210
The only pre-emptive check we might want to keep from GetFullPath is to return false if the passed in path contains an embedded null.
Maybe have a
FileSystem.FileExistsFast
? This is the current Windows one:https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/File.cs#L181-L210
Unix is already in much better state. I assume that crappy and partial (relative) paths will work.
Tests for bad and relative paths are critical, need to validate existing coverage and expand if necessary.