e-dant / watcher

Filesystem watcher. Works anywhere. Simple, efficient and friendly.
MIT License
639 stars 32 forks source link

Kernel Features Tracker #10

Closed X-Ryl669 closed 1 year ago

X-Ryl669 commented 1 year ago

I've read the source code and I was disappointed not to find any usage of kernel's primitive to watch for filesystem changed (except for Darwin).

Why don't you use kernel primitives (like inotify on linux, for example)?

IMHO, it'll be a lot more efficient than computing a giant hash table of path to monitor.

redcinelli commented 1 year ago

@X-Ryl669 It seems it is partially answered in this issue

For inotify, that's events potentially disappearing or being duplicated...

It seems like they choose not to use inotify out of safety concerns.

e-dant commented 1 year ago

Using inotify is something I've drifted back and forth on.

Here is a brief from the author of the LLFIO (this is from a conversation on Slack):

Also Linux's kernel API is racy and broken (both APIs), the Windows one has many gotchas, the only decent kernel API is BSD's. The BSD kernel API is accurate and performant, but if it gets overloaded due to too much filesystem changes it may report back "I give up" and then you need to fall back on a manual delta calculation.

@ned14 in what ways are the Linux kernel APIs broken (I’m assuming the inotify API, right?) — and are there solutions or workarounds?

They drop notifications, report the wrong changes, sometimes don't work at all. And appear to spasm between half working and not working in a bursty way if there is enough change happening on the filesystem. Windows and BSD at least have the decency to explicitly tell you "I give up".

I (am hoping to) disagree with some points there. Specifically, I'm looking for a balance between efficiency and safety, and to illustrate those caveats in the readme.

e-dant commented 1 year ago

But, yes, you're also right that manually computing deltas (with the already slightly sub-optimal unordered_map) isn't great.

There's somewhere between between the two. It requires more research into the broken parts of inotify.

ned14 commented 1 year ago

You'll always need a way of falling back to manual delta comparison with any of the change notification systems on any of the three major platforms. The best any kernel API can give you is hints about shortcuts to take.

e-dant commented 1 year ago

Solution: Bring inotify up. Write filesystem-event heavy tests that illustrate a difference in output between an inotify and the "baseline" warthog adapter. If the results are bad enough, revert. In any other case, make a section in the readme about the caveats.

Better solution: Do what Ned said.

ned14 commented 1 year ago

Personally, I'd make a delta generation implementation which works off scanning the filesystem and comparing graphs. I'd make that fast, which is doable (certainly 10 million items per second is doable). I'd then have inotify trigger the delta calculation, but also have a fallback timer so it always runs every few seconds anyway.

You might, if you wanted, only compare the subset of the graph which inotify tells you about, but then say every ten seconds do a full scan anyway.

inotify has issues on NFS and CIFS. And god only knows with something like Lustre. Certainly some heuristic which detects when inotify isn't telling you about notifications and restores regular polling should be easy enough to implement.

Good luck with it, making this fast, reliable and race free is challenging even for a domain expert.

e-dant commented 1 year ago

Preliminary results with an inotify-only adapter are not pretty. This is a snippet taken while running it with tool/test/dir:

`tell bun .` & ; sleep 5
tool/test/dir
"1666145122927923891":{"where":"122","what":"destroy","kind":"dir"},
"1666145122927966274":{"where":"
","what":"destroy","kind":"file"},
"1666145122927971681":{"where":"161","what":"destroy","kind":"dir"},
"1666145122928012833":{"where":"303","what":"destroy","kind":"file"},
"1666145122928016415":{"where":"303","what":"destroy","kind":"dir"},
"1666145122928058140":{"where":"336","what":"destroy","kind":"file"},
"1666145122928061756":{"where":"336","what":"destroy","kind":"dir"},
"1666145122928105673":{"where":"107","what":"destroy","kind":"file"},
"1666145122928109207":{"where":"107","what":"destroy","kind":"dir"},
"1666145122928154216":{"where":"464","what":"destroy","kind":"file"},
"1666145122928157753":{"where":"464","what":"destroy","kind":"dir"},
"1666145122928203024":{"where":"","what":"destroy","kind":"file"},

That's a lot of misreported and missing errors. It's not sequential, some events are bogus, and it missed all of the create events. inotify also seems to hang after some (large) number of directories:

`tell bun $HOME` & ; sleep 5
-- Configuring done
-- Generating done
-- Build files have been written to: /home/edant/smstore/watcher/build/out
[2/2] Linking CXX executable water.watcher
{"water.watcher.stream":{
top
 PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 886594 edant     20   0    8508   5636   3376 R  99.7   0.0   1:50.88 water.watcher

This is, somehow, worse than I remember inotify being the last time I wrote up a program like this. The preliminary results are a bit too bad.

I'll look into:

I'll create a branch (or just push this to next) after those points are explored.

This is preliminary work towards Ned's thoughts.

e-dant commented 1 year ago

Changed the title to better reflect the ongoing work

X-Ryl669 commented 1 year ago

Using fanotify is told to be lighter in resources and less racy with directory creation.

The "errors" you are reporting here seems to be well documented: inotify is limited by the number of item it can observe. So I guess it's hanging because you're not checking for errors when crossing this limit (and IIRC, it's very low, something like 16K)

As for missing creation event, it's by design I would say even if it's an awful design (fanotify doesn't have this issue, when monitoring mount or filesystem). So inotify can't be used without mirroring the file hierarchy in your software so it's very hungry in memory.

fanotify on the other end, requires linux kernel > 5.1 to be useful, doesn't need to replicate the file hierarchy in memory (it can be re-generated on the fly with the received events).

e-dant commented 1 year ago

I had toyed with fanotify but could not get enough useful information in userspace. Perhaps I had set some of the wrong flags. I will revisit fanotify.

Perhaps we can check the user's id and switch to fanotify in that case. It's a wonderful API. I had a lot of fun with it as su.

For now, though, we're using inotify on Linux. It's a tricky API with a lot of caveats. I've managed to work through most of them, including the issues I posted above.


In the future, I'm looking to see where this implementation's shortcomings are and bolster them with sanity checks.

The same would need to happen with fanotify, albeit with less paranoia. (The docs on inotify say as much.)

A "parallel" warthog could run every little while and check up on the state of the file tree. I think the important problem to think about is how to get it to run in piecemeal while we're waiting for events. I don't want the program to stutter when we should be handling events.

e-dant commented 1 year ago

Only platform API left to do is Windows. I'm looking into ways to select fanotify if the user is a superuser (id 0).

Rarely, but it does happen, I've found some misreported event types on Darwin and some missing events on Linux.

I think the best way to build up a heuristic is to have a tree of "open" events (such as files being open, but not written, or locks being opened, but not closed). We could try to balance the tree by manually polling the "open" events every little while.

On Darwin, we could ignore the FSEvents API's file type reporting and opt in to std::filesystem::is_*. I'm tempted to think that's lightweight enough to use for many, many paths. It's likely just a stat under the hood. But we should test it.

Side note: We need performance tests.

e-dant commented 1 year ago

The next branch has system API calls for Windows. (It also has support for concurrent watchers.)

Docs are added to the readme in that branch as well (section "OS APIs").

I want to implement an fanotify adapter on Linux to select when the user is root. (Right now, we always use inotify.)

I may close this issue after fanotify.

e-dant commented 1 year ago

Done :)