dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Define and add `FileSystemError` enum and matching property to IOException #926

Open Neme12 opened 5 years ago

Neme12 commented 5 years ago

(similar to dotnet/corefx#34220)

Scenario: I want to create and write some default data into a file unless it already exists.

A naive implementation would be:

if (!File.Exists(path))
{
    using (var fileStream = File.Open(path, FileMode.CreateNew))
    {
        // ...
    }
}

The problem with this is that I'm accessing the file system at 2 points in time. As @jaredpar has taught me, File.Exists is evil. There's no guarantee that because File.Exists returned false, the file still won't exist when calling File.Open.

To be robust, we should just call File.Open and catch the exception it throws in case the file already exists. The problem is that it throws System.IO.IOException with a message of "The file already exists". There is no specific exception type for this scenario. At first it would seem that the only thing we can do is catch the exception depending on its message string (which is a terrible idea), but luckily, there is a specific HResult for this failure, leaving us with:

Stream fileStream = null;

try
{
    fileStream = File.Open(path, FileMode.CreateNew);
}
catch (IOException e) when (e.HResult == -2147024816) // FILE_EXISTS
{
}

if (fileStream != null)
{
    using (fileStream)
    {
        // ...
    }
}

This works OK but makes the code seem unreadable and maybe even brittle because we're depending on a magic constant that comes somewhere from the Windows API. I'm not sure if this code works outside of Windows at all, but even if it does, it's definitely not obvious.

Please add a dedicated exception type for this kind of failure.

SweetShot commented 5 years ago

Hello @danmosemsft

I have looked into places with IOException in coreclr shared, it doesn't have any newer categories that can be added to the FileSystemEnum.

I validated proposed enum against all possible causes, considering only filesystem for now, this enum covers all cases but Move across volumes (which i have added). I also have added more comments for extra errors where same enum value can be mapped. As far as InvalidName and InvalidPath is concerned I think it should be kept separate as it comes from 2 distinct coding errors by developers and not cause of os.

Also InsufficientResources and OutOfMemory sounds similar at least to me, while we know they denote two separate conditions. Maybe InsufficientResources can be named as TooManyHandles it will also include files as files are handles anyway.

    public enum FileSystemError
    {
        FileNotFound,  
        FileAlreadyExists, 
        DirectoryNotFound,
        DirectoryAlreadyExists,
        DirectoryNotEmpty,   // When removing a directory
        DiskNotFound, // or "Volume"? Same as device unavailable?
        DiskFull, // or "InsufficientSpace" ? 
        DiskNotReady, // Disk exists, but cannot be used. Do we need "DiskReadOnly" ? 
        MoveAcrossVolumes, // can be clubbed with InvalidPath ? i.e. dest path for move?
        AccessDenied,  // Insufficient privilege, Also for Invalid Mode? write on readonly? bad file mode?
        FileInUse, // Sharing violation, Already open 
        FileTooLarge,
        InvalidName,   // Invalid characters in name 
        InvalidPath, // Path too long, malformed? Same as InvalidName case?
        InvalidHandle, 
        WriteFailure,  // Random write failure, or beyond end, etc. also for io_http/net_io
        ReadFailure,   // Same for read
        InvalidStream, // Bad stream or invalid location in stream passed to the API, maps to SeekBeforeBegin/AfterEnd
        TooManyHandles,   // too many open files, handles, 
        OutOfMemory, // Memory mapped files, 
        CopyToSelf, // Cannot copy over self
        DecompressionFailure,  // Invalid content of compressed file maps to unexpected end of stream, dir name with data, extract outside zip
        RemotePathNotFound, // ??? something for trying to write to \\doesnotexist\foo ?  server not found? share not found?
        Undefined,  // We would set this for all other cases, including non filesystem issues, hopefully the HRESULT is useful instead, or at least the message text...
    }
joshudson commented 5 years ago

You've missed quite a lot that I've actually had to handle.

InvalidFunction or NotSupported, // Windows calls this InvalidFunction while *nix calls this ENOTSP; not sure which is better
TooManyLinks,  // Too many hard links to this file
OutOfFiles, // cannot create any more files on this device
TooManySymbolicLinks, // ELOOP path traversal tried to descend too many symbolic links
ReadChecksum, // disk reported CRC error on read
IsADirectory, // tried a file operation on a directory
IsNotADirectory, // tried a directory operation on a file
NotATerminal , // tried a terminal operation on a pipe
NotSeekableDevice, // tried to seekon something that can't be seeked on
BrokenPipe, // tried to write to a pipe when the reader is closed
QuotaExceeded, // there's plenty of space on the disk, but not for this account
OperationNotSupportedOnSymbolicLink, // Windows does not allow hard link to symbolic link
RemoteIOError, // WriteFailure or ReadFailure on the other end of a network filesystem reports this; I suppose you *could* fold this into WriteFailure  or ReadFailure; I only handle this one specifically for generating better error messages.
DeletePending // tried to open a file with a pending delete; can possibly be folded into FileInUse

.

InsufficientResources,   // too many open files, handles

That name is rather horrible. TooManyOpenFiles would be much better. Ninad Sheth is already confused by the name.

Also, clubbing MoveAcrossVolumes into something else would be bad for me. I handle this one specifically too.

danmoseley commented 5 years ago

Thanks @joshudson.

Some of these are fine grained distinctions.

  1. Would the caller know what to test for? For example if I try to create a directory at the location of an existing file, would the caller know whether to expect IsADirectory or FileAlreadyExists?
  2. Can we make the distinctions all platforms from the error code without extra cost? (For example, when CreateDirectory fails on Windows with ERROR_ALREADY_EXISTS, we already have to do more IO to determine whether it was a file or a directory that already existed.)
  3. Are these meaningful and well defined on all platforms, if that matters? Part of the purpose here is abstracting platforms.
  4. What granularity does the app need? If it is not much (eg app wants to choose between "ask user for a new location" and "just show IO failed") then testing against many possiblities means some verbose code -- unless this becomes a big [Flags] enumeration with some catch-all entries...
  5. Is Undefined an appropriate catch-all given in many cases we will have a well defined distinct error code, it just simply isn't represented in the enumeration (yet?)

For you @joshudson I'm guessing you want as granular as possible, since you're using HRESULT already. @arunjvs is this also true for your case?

joshudson commented 5 years ago

For example if I try to create a directory at the location of an existing file, would the caller know whether to expect IsADirectory or FileAlreadyExists?

IsADirectory is never the result of a name collision; it is only ever the result of trying to do something like new FileStream(@"C:\", ...); Same general idea for IsNotADirectory

On looking back at it, I think we can fold QuotaExceeded into DiskFull.

Specific cases that come up a lot:

try {
  // ...
   CreateHardLink()
   // ...
} catch (InvalidFunction) {
    /* Use the FAT algorithm */
}

try {
    new FileStream(...)
} catch (IsADirectory)
{
    /* recursive call */
}

try {
    file.Position = somebigvaule;
} catch (NotSeekableDevice) {
    /* read-discard loop to get to new position */
}

for (int tries = 0; tries < number; tries++)
{
   try {
       file = new FileStream(...)
    } catch (FileInUse)
    {
         Threading.Sleep(30);
    }
}

try {
    DeviceIOControl()
} catch (NotATerminal ) { /* Yoda would approve */ }

try {
} catch (BrokenPipe) throw new ThreadAbortException();
} catch (IOException) { /* stuff to do */
}

In theory, the following reduce mappings work well enough:

I have never had to catch InvalidStream but only because seeking past the end is valid.

danmoseley commented 5 years ago

I created a table of the error code mappings I know of and where possible, specific causes. I also checked this against the various mappings in .NET Core itself. If you have suggestions to improve/fill in this table, let me know and I will update it.

Perhaps this will help us identify a reasonable granularity of causes, that are also meaningful across platforms.

https://gist.github.com/danmoseley/ee382954ebdda9807d54a569dd662eb4

danmoseley commented 5 years ago

https://github.com/gapotchenko/Gapotchenko.FX/blob/master/Source/Gapotchenko.FX.IO/IOExceptionExtensions.cs demonstrates one case.

danmoseley commented 5 years ago

@joshudson @sweetshot @tmds any feedback on my table?

Incidentally, as a separate issue, one coudl argue for a Path field on IOException. Most often it is only accessible by parsing the string.

joshudson commented 5 years ago

I posted some comments on the gist. I don't see anything to add to them.

danmoseley commented 5 years ago

@joshudson my bad, I missed those somehow. thanks.

tmds commented 5 years ago

@danmosemsft I had not picked up this issue, I'll take a look at the issue and table coming week.

danmoseley commented 5 years ago

@tmds for sure no urgency on this one, at this point it must be a post 3.0 change if we do anything here.

tmds commented 5 years ago

I've read through the issue. Once the enum is there, it's Undefined value can later be split into more specific values (the user should add a catch IOException, and not check Undefined). Other values can not be split, because that will break existing code.

The added-value of the enum is to be able to handle certain specific values (Exists being the one requested in the issue). Programs won't be able to handle all cases, so for some errors, the UI will show the Exception.Message.

Perhaps we should keep the enum small to begin with, and map most errors to Undefined. We can see on a per method basis what error the programmer may want to explicitly handle.

danmoseley commented 3 years ago

the link to my table broke due to my ID change. fixed link: https://gist.github.com/danmoseley/ee382954ebdda9807d54a569dd662eb4

added up-for-grabs in case anyone is interested in taking this further. the next steps are

  1. look at the table, and other feedback, and propose an enumeration for API review. as @tmds points out, it can start small.
  2. once we have consensus here, we can take it for review. we should also ask them for guidance on how breaking it is to change error codes later, eg., if we subdivide a code. if that's quite breaking, we should probably aim for a comprehensive enum up front.
CyberSinh commented 7 months ago

Sometimes, the HResult of the IOException is not set when it could be, which makes identifying the underlying problem really tricky, like https://github.com/dotnet/runtime/issues/100457