Open georg-jung opened 9 months ago
Tagging subscribers to this area: @dotnet/area-extensions-filesystem See info in area-owners.md if you want to be subscribed.
Author: | georg-jung |
---|---|
Assignees: | - |
Labels: | `area-Extensions-FileSystem` |
Milestone: | - |
I don't have (or don't remember) context, but is there already an established pattern in file watcher for exceptions on its threadpool threads? If so any interest in offering a PR applying it in this path?
Sorry for the delay. I don't think there is a very clear established pattern.
If PhysicalFileWatcher
uses the FileSystemWatcher backend, it handles the FileSystemWatcher.Error Event
, which's docs read (emphasis mine):
This event is raised whenever something prevents the FileSystemWatcher object from monitoring changes. For example, if the object is monitoring changes in a remote directory and the connection to that directory is lost, the Error event is raised.
In that case, the listeners will be notified:
private void OnError(object sender, ErrorEventArgs e)
{
// Notify all cache entries on error.
foreach (string path in _filePathTokenLookup.Keys)
{
ReportChangeForMatchedEntries(path);
}
}
The polling implementations PollingWildCardChangeToken.cs
and PollingFileChangeToken.cs
do however not contain any try-catch blocks at all.
From the top of my head there are multiple options:
PollingWildCardChangeToken
and PollingFileChangeToken
signal on error. They are public types though and if they are not used in the context of PhysicalFileWatcher
, errors can be catched when get
ing the HasChanged
property's value.PollingWildCardChangeToken
and PollingFileChangeToken
. This would be a new public API though.PhysicalFilesWatcher
where it polls the tokens. This would get the polling behaviour in line with FileSystemWatcher
-backed instances. It also wouldn't add/change any public API surface. It also wouldn't change any public API's behaviour - except from not taking down processes due to uncatchable exceptions on the thread pool - and signalling the listeners instead.Thus the third option seems to be preferrable to me. It also seems like an almost non-invasive change (any chance to get this in a servicing release?) to me.
I'd be happy to create a PR when we agree on how to proceed 👍.
+1 to this. I have also encountered the same problem there is currently no way to recover when the connection to the remote directory is lost.
If PhysicalFileWatcher uses the FileSystemWatcher backend, it handles the FileSystemWatcher.Error Event
There are also uncaught exceptions on FSW when certain folders can't be enumerated, see callstack in https://github.com/dotnet/runtime/issues/91879.
There's a 4th option, DirectoryInfoWrapper
uses _directoryInfo.EnumerateFileSystemInfos("*", SearchOption.TopDirectoryOnly)
that doesn't enable EnumerationOptions.IgnoreInaccessible
. I would expect that using such option could fix this case.
However, it's been historically annoying to deal with unexpected exception in RaiseChangeEvents
:
Wrapping the whole method body in a try-catch could also be best if its safe to do so.
Judging from it's docs alone I don't think using IgnoreInaccessible
would fix cases where a network share goes down:
Gets or sets a value that indicates whether to skip files or directories when access is denied (for example, UnauthorizedAccessException or SecurityException).
Looking at it's source I'm unsure for which cases EBADF
stands. The explicit "Host is down" message in my original stack trace could however indicate the internal error code is rather Error_EHOSTDOWN
. Which means, if I understand it correctly, option 4 wouldn't fix this issue.
I see, unless we modify IgnoreInaccessible
on Linux, option 4 won't work. @georg-jung would you like to explore option 3 and send a PR?
Description
I use
Microsoft.Extensions.FileProviders.Physical.PhysicalFilesWatcher
in polling mode. If polling fails with an IOException, it takes the process down as the exception is thrown on a thread pool thread where it can not be catched.Related: #71003 #72462 Reading the related issues, it seems like @Jozkee is the right one to tag :-) cc: @danmoseley
Stacktrace
Details of my setup/environment
My code runs inside an ubuntu-chiseled-based docker container. On the unix container host, I have a CIFS mount defined in
/etc/fstab
like this:This mount is then further passed to the container:
The mounted volume is only available at specific times though (say, 9 to 5 but not at night). Thus it will predictably go down and when it does it takes my worker process with it.
Reproduction Steps
Creation of my PhysicalFilesWatcher is quite straight forward:
I have
DOTNET_USE_POLLING_FILE_WATCHER=1
set.Point this PhysicalFilesWatcher to a path that becomes unavailable, in my case the host of a mounted network file system goes down. On the next poll, the process stops forcefully.
Expected behavior
PhysicalFilesWatcher does not throw any uncatchable exceptions.
One of:
Actual behavior
PhysicalFilesWatcher takes processes down because it throws uncatchable exceptions.
Regression?
I don't think this is a regression.
Known Workarounds
Only: Don't use PhysicalFilesWatcher.
Configuration
docker container run --rm --entrypoint dotnet my-worker:latest --info
Other information
No response