dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.28k stars 4.73k forks source link

FileSystemWatcher on Linux uses an excessive amount of resources #62869

Open neilmayhew opened 2 years ago

neilmayhew commented 2 years ago

Description

There are many reports on the web of running into this message:

System.IO.IOException : The configured user limit (128) on the number of inotify instances has been reached, or the per-process limit on the number of open file descriptors has been reached.

The Linux implementation of FileSystemWatcher creates a new thread with its own inotify instance for every directory that's watched. This is very expensive in kernel resources, and often hits system limits as can be seen from the message. It's particularly problematic when running in a container because the per-UID quota isn't namespaced and is therefore shared among all containers on the same host. To make things worse, many ASP.net applications running in a container are incorrectly configured and use more FileSystemWatchers than they need to. Such a container consumes much more than its fair share of the available inotify instances, which then causes other containers in the pod (eg with k8s) to fail unexpectedly.

Instead, the implementation should have a single thread with a single inotify instance that's shared by all the FileSystemWatchers in the process. Instead of adding new instances (which are expensive) the implementation would add new watches (which are cheap). It would then be possible to accommodate even the most egregious of ASP.net apps with ease..

Configuration

Any Linux system.

Regression?

No. This has been going on for a long time and there are a lot of puzzled blog posts about it.

Data

Blog posts:

Related issues:

Analysis

The inotify system was never intended to be used like this.

However, rather than changing the implementation to use a single shared thread, it might be a better use of resources to write a new implementation using fanotify as suggested in:

dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Description There are many reports on the web of running into this message: > System.IO.IOException : The configured user limit (128) on the number of inotify instances has been reached, or the per-process limit on the number of open file descriptors has been reached. The Linux implementation of `FileSystemWatcher` creates a new thread with its own inotify [instance](https://man7.org/linux/man-pages/man2/inotify_init.2.html) for every directory that's [watched](https://man7.org/linux/man-pages/man2/inotify_add_watch.2.html). This is very expensive in kernel resources, and often hits system limits as can be seen from the message. It's particularly problematic when running in a container because the per-UID quota isn't namespaced and is therefore shared among all containers on the same host. To make things worse, many ASP.net applications running in a container are incorrectly configured and use more `FileSystemWatcher`s than they need to. Such a container consumes much more than its fair share of the available inotify instances, which then causes other containers in the pod (eg with k8s) to fail unexpectedly. Instead, the implementation should have a single thread with a single inotify instance that's shared by all the `FileSystemWatcher`s in the process. Instead of adding new instances (which are expensive) the implementation would add new watches (which are cheap). It would then be possible to accommodate even the most egregious of ASP.net apps with ease.. ### Configuration Any Linux system. ### Regression? No. This has been going on for a _long_ time and there are a lot of puzzled blog posts about it. ### Data Blog posts: * [Travis J. Gosselin](http://blog.travisgosselin.com/configured-user-limit-inotify-instances/) * [Anthony Liriano](https://anthonyliriano.com/2021/04/08/21-configured-user-limit-128-on-the-number-of-inotify-instances-has-been-reached.html) Related issues: * #27272 ### Analysis * The per-directory inotify instance is created [here](https://github.com/dotnet/runtime/blob/57bfe474518ab5b7cfe6bf7424a79ce3af9d6657/src/libraries/System.IO.FileSystem.Watcher/src/System/IO/FileSystemWatcher.Linux.cs#L35). * The underlying [native function](https://github.com/dotnet/runtime/blob/7414af2a5f6d8d99efc27d3f5ef7a394e0b23c42/src/native/libs/System.Native/pal_io.c#L1263) calls [inotify_init](https://man7.org/linux/man-pages/man2/inotify_init.2.html). The inotify system was never intended to be used like this. However, rather than changing the implementation to use a single shared thread, it might be a better use of resources to write a new implementation using `fanotify` as suggested in: * #30495
Author: neilmayhew
Assignees: -
Labels: `area-System.IO`, `tenet-performance`, `untriaged`
Milestone: -
adamsitnik commented 2 years ago

@neilmayhew I agree that the current implementation is not perfect.

@tmds do you have any recommendations?

tmds commented 2 years ago

We can look at improving the implementation. Maybe some simpler changes can be made related to:

To make things worse, many ASP.net applications running in a container are incorrectly configured and use more FileSystemWatchers than they need to.

@neilmayhew can you give some more information about this?

neilmayhew commented 2 years ago

@tmds Sorry for the slow reply.

Both of the blogs I linked to in the description mention ASP.net, and refer to a solution that looks like this:

configuration = new ConfigurationBuilder()
                     .SetBasePath(Directory.GetCurrentDirectory())    
                     .AddJsonFile("appsettings.json", optional: true, reloadOnChange: false);

However, one of the comments on the first blog says that wasn't enough and links to a Stack Overflow question about WebApplicationFactory that has a recursive solution. That in turn links to another question, which doesn't seem to be about ASP.net specifically. So I don't think this is exactly a problem with ASP.net, more about the default behavior of ConfigurationBuilder, but it seems that the problem crops up more often with, or is made worse by, ASP.net.

neilmayhew commented 2 years ago

A web search for the error message in question turns up a zillion hits, so a lot of people are encountering this, but previously no-one seems to have figured out the root of the problem and everyone is just finding ways to work around it.

It should be possible to watch thousands of files and directories on Linux without much overhead, but that's not possible with the way it's implemented currently.

The solution I'm thinking of is to use class variables for a common inotify instance and background thread, instead of instance variables and an instance and thread per FileSystemWatcher object. The common thread would send events just like the single thread does, but it would need to keep a dictionary mapping from inotify watch descriptor to FileSystemWatcher so it can obtain the .Net event it needs to send when it reads an inotify event from the inotify instance. This would be only slightly more complex than the current implementation, and the complexity would be concerned only with mutexing and keeping the dictionary updated when FileSystemWatcher instances are created and destroyed.

neilmayhew commented 2 years ago

BTW, I don't think it would be appropriate to switch the implementation to use fanotify instead of inotify. Although fanotify is newer, it's designed for a different use case and isn't a replacement for inotify. I'll update the description.

neilmayhew commented 2 years ago

I see that the implementation already uses a mutexed dictionary for mapping watch descriptors to absolute paths (_wdToPathMap) so a lot of the infrastructure I discussed is already in place and could be repurposed quite easily.

suchoss commented 1 year ago

referenced pull request is not a proper fix, but a workaround... :(

wasabii commented 1 year ago

My view on this is there is a fundamental limitation in the API of FileSystemWatcher that causes this issue: it can only watch a single path. And because of that, user's are drawn to create multiple FileSystemWatchers, to watch multiple directories. And since a single FileSystemWatcher holds a single IOCP port, or inotify queue, this is artifically inflating the number of such queues that the system needs to allocate.

In IKVM we had been using FileSystemWatcher as the backing support for our java.nio.file.WatchService support. But, we ran into this issue. WatchService in Java starts out pretty much the same: user's create an instance, and that instance owns a queue or port. However, user's then register and unregister multiple potentially unrelated paths to listen to with the service. So, in that ecosystem, user's create WatchService instance less frequently. Usually one per problem domain. But then register large directory hierarchies, etc, with that single one. But they're in control of how many ports they allocate, and how many directories they watch with each.

So, as an API suggestion that might alleviate this: support multiple Paths. Yes, FileSystemWatcher.Path is a single property. And the class takes a single value in it's ctor. But maybe there's some clean way to extend this a bit.

neilmayhew commented 1 year ago

@wasabii Your understanding matches what I was saying in the description. However, I'm suggesting a solution that doesn't require any changes to the API. (I'm pretty sure changing the API isn't an option.)

In the solution I'm suggesting, there would be a single, global inotify queue for all FileSystemWatchers (ie a class variable) instead of having a separate inotify queue per FileSystemWatcher (ie an instance variable) as there is at present.

As I've outlined, I think this would be relatively simple to implement. I wish I had the time to do it myself but the job I'm in now doesn't use .Net and so I can't justify it.