e-dant / watcher

Filesystem watcher. Works anywhere. Simple, efficient and friendly.
MIT License
690 stars 39 forks source link
c cpp efficient fast filesystem friendly header-only javascript nodejs python safe simple single-header thread-safe watcher

Watcher

Conan Center Rust/Cargo Crate PyPI/Pip Package Builds and Publishing for Distribution CodeQL Tests Linux Tests macOS Tests Android Compilation Tests Windows Tests

Quick Start

C++ ```cpp #include "wtr/watcher.hpp" #include #include using namespace std; using namespace wtr; // The event type, and every field within it, has // string conversions and stream operators. All // kinds of strings -- Narrow, wide and weird ones. // If we don't want particular formatting, we can // json-serialize and show the event like this: // some_stream << event // Here, we'll apply our own formatting. auto show(event e) { cout << to(e.effect_type) + ' ' + to(e.path_type) + ' ' + to(e.path_name) + (e.associated ? " -> " + to(e.associated->path_name) : "") << endl; } auto main() -> int { // Watch the current directory asynchronously, // calling the provided function on each event. auto watcher = watch(".", show); // Do some work. (We'll just wait for a newline.) getchar(); // The watcher would close itself around here, // though we can check and close it ourselves. return watcher.close() ? 0 : 1; } ``` ```sh # Sigh PLATFORM_EXTRAS=$(test "$(uname)" = Darwin && echo '-isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -framework CoreFoundation -framework CoreServices') # Build eval c++ -std=c++17 -Iinclude src/wtr/tiny_watcher/main.cpp -o watcher $PLATFORM_EXTRAS # Run ./watcher ```
C ```c #include "wtr/watcher-c.h" #include void callback(struct wtr_watcher_event event, void* _ctx) { printf( "path name: %s, effect type: %d path type: %d, effect time: %lld, associated path name: %s\n", event.path_name, event.effect_type, event.path_type, event.effect_time, event.associated_path_name ? event.associated_path_name : "" ); } int main() { void* watcher = wtr_watcher_open(".", callback, NULL); getchar(); return ! wtr_watcher_close(watcher); } ```
Python ```sh pip install wtr-watcher ``` ```python from watcher import Watch with Watch(".", print): input() ```
Rust ```sh cargo add wtr-watcher tokio futures ``` ```rust use futures::StreamExt; use wtr_watcher::Watch; #[tokio::main(flavor = "current_thread")] async fn main() -> Result<(), Box> { let show = |e| async move { println!("{e:?}") }; let events = Watch::try_new(".")?; events.for_each(show).await; Ok(()) } ```
Node.js ```javascript import * as watcher from 'watcher'; var w = watcher.watch('.', (event) => { console.log(event); }); process.stdin.on('data', () => { w.close(); process.exit(); }); ```

The output of each above will be something this, depending on the format:

modify file /home/e-dant/dev/watcher/.git/refs/heads/next.lock
rename file /home/e-dant/dev/watcher/.git/refs/heads/next.lock -> /home/e-dant/dev/watcher/.git/refs/heads/next
create file /home/e-dant/dev/watcher/.git/HEAD.lock

Enjoy!


Tell Me More

A filesystem event watcher which is

  1. Friendly

    I try to keep the 1579 lines that make up the runtime of Watcher relatively simple and the API practical:

    auto w = watch(path, [](event ev) { cout << ev; });
    wtr.watcher ~
  2. Modular

    Watcher may be used as a library, a program, or both. If you aren't looking to create something with the library, no worries. Just use ours and you've got yourself a filesystem watcher which prints filesystem events as JSON. Neat. Here's how:

    # The main branch is the (latest) release branch.
    git clone https://github.com/e-dant/watcher.git && cd watcher
    # Via Nix
    nix run | grep -oE 'cmake-is-tough'
    # With the build script
    tool/build --no-build-test --no-run && cd out/this/Release # Build the release version for the host platform.
    ./wtr.watcher | grep -oE 'needle-in-a-haystack/.+"' # Use it, pipe it, whatever. (This is an .exe on Windows.)
  3. Efficient

    You can watch an entire filesystem with this project. In almost all cases, we use a near-zero amount of resources and make efficient use of the cache. We regularly test that the overhead of detecting and sending an event to the user is an order of magnitude less than the filesystem operations being measured.

  4. Well Tested

    We run this project through unit tests against all available sanitiziers. This code tries hard to be thread, memory, bounds, type and resource-safe. What we lack from the language, we try to make up for with testing. For some practical definition of safety, this project probably fits.

  5. Dependency Minimal

    Watcher depends on the C++ Standard Library. For efficiency, we leverage the OS when possible on Linux, Darwin and Windows. For testing and debugging, we use Snitch and Sanitizers.

  6. Portable

    Watcher is runnable almost anywhere. The only requirement is a filesystem.


Usage

Project Content

The important pieces are the (header-only) library and the (optional) CLI program.

A directory tree is in the notes below.

The Library

The two fundamental building blocks here are:

watch takes a path, which is a string-like thing, and a callback, with is a function-like thing. For example, passing watch a character array and a closure would work well in C++.

Examples for a variety of languages can be found in the Quick Start. The API is relatively consistent across languages.

The watcher will happily continue watching until you stop it or it hits an unrecoverable error.

The event object is used to pass information about filesystem events to the callback given (by you) to watch.

The event object will contain:

(Note that, for JavaScript, we use the camel-case, to be consistent with that language's ecosystem.)

State Changes and Special Events

The watcher type is special.

Events with this type will include messages from the watcher. You may recieve error messages or important status updates.

This format was chosen to support asynchronous messages from the watcher in a generic, portable format.

Two of the most important "watcher" events are the initial "live" event and the final "die" event.

The message appears prepended to the watched base path.

For example, after opening a watcher at /a/path, you may receive these messages from the watcher:

The messages always begin with either an s, indicating a successful operation, a w, indicating a non-fatal warning, or an e, indicating a fatal error.

Importantly, closing the watcher will always produce an error if

The last event will always be a destroy event from the watcher. You can parse it like this, for some event ev:

ev.path_type == path_type::watcher && ev.effect_type == effect_type::destroy;

Happy hacking.

Your Project

This project tries to make it easy for you to work with filesystem events. I think good tools are easy to use. If this project is not ergonomic, file an issue.

Here is a snapshot of the output taken while preparing this commit, right before writing this paragraph.

{
  "1666393024210001000": {
    "path_name": "/home/edant/dev/watcher/.git/logs/HEAD",
    "effect_type": "modify",
    "path_type": "file"
  },
  "1666393024210026000": {
    "path_name": "/home/edant/dev/watcher/.git/logs/refs/heads/next",
    "effect_type": "modify",
    "path_type": "file"
  },
  "1666393024210032000": {
    "path_name": "/home/edant/dev/watcher/.git/refs/heads/next.lock",
    "effect_type": "create",
    "path_type": "other"
  }
}

Which is pretty cool.

A capable program is here.

Consume

This project is accessible through:

Conan See the [package here](https://conan.io/center/recipes/watcher).
Nix ```sh nix build # To just build nix run # Build the default target, then run without arguments nix run . -- / | jq # Build and run, watch the root directory, pipe it to jq nix develop # Enter an isolated development shell with everything needed to explore this project ```
Bazel ```sh bazel build cli # Build, but don't run, the cli bazel build hdr # Ditto, for the single-header bazel run cli # Run the cli program without arguments ```
`tool/build` ```sh tool/build cd out/this/Release # watches the current directory forever ./wtr.watcher # watches some path for 10 seconds ./wtr.watcher 'your/favorite/path' -s 10 ``` This will take care of some platform-specifics, building the release, debug, and sanitizer variants, and running some tests.
CMake ```sh cmake -S . -B out cmake --build out --config Release cd out # watches the current directory forever ./wtr.watcher # watches some path for 10 seconds ./wtr.watcher 'your/favorite/path' -s 10 ```

Bugs & Limitations

"Access" events are ignored Watchers on all platforms intentionally ignore modification events which only change the acess time on a file or directory. The utility of those events was questionable. It seemed more harmful than good. Other watchers, like Microsoft's C# watcher, ignore them by default. Some user applications rely on modification events to know when themselves to reload a file. Better, more complete solutions exist, and these defaults might again change. Providing a way to ignore events from a process-id, a shorthand from "this" process, and a way to specify which kinds of event sources we are interested in are good candidates for more complete solutions.
Safety and C++ I was comfortable with C++ when I first wrote this. I later rewrote this project in Rust as an experiment. There are benefits and drawbacks to Rust. Some things were a bit safer to express, other things were definitely not. The necessity of doing pointer math on some variably-sized opaque types from the kernel, for example, is not safer to express in Rust. Other things are safer, but this project doesn't benefit much from them. Rust really shines in usability and expression. That might be enough of a reason to use it. Among other things, we could work with async traits and algebraic types for great good. I'm not sure if there is a language that can "just" make the majority of the code in this project safe by definition. The guts of this project, the adapters, talk to the kernel. They are bound to use unsafe, ill-typed, caveat-rich system-level interfaces. The public API is just around 100 lines, is well-typed, well-tested, and human-verifiable. Not much happens there. Creating an FFI by exposing the adapters with a C ABI might be worthwhile. Most languages should be able to hook into that. The safety of the platform adapters necessarily depends on each platform's documentation for their interfaces. Like with all system-level interfaces, as long as we ensure the correct pre-and-post-conditions, and those conditions are well-defined, we should be fine.
Platform-specific adapter selection Among the platform-specific [implementations](https://github.com/e-dant/watcher/tree/release/devel/include/detail/wtr/watcher/adapter), the `FSEvents` API is used on Darwin and the `ReadDirectoryChanges` API is used on Windows. There is some extra work we do to select the best adapter on Linux. The `fanotify` adapter is used when the kernel version is greater than 5.9, the containing process has root priveleges, and the necessary system calls are otherwise allowed. The system calls associated with `fanotify` may be disallowed when inside a container or cgroup, despite the necessary priviledges and kernel version. The `inotify` adapter is used otherwise. You can find the selection code for Linux [here](https://github.com/e-dant/watcher/blob/next/devel/include/detail/wtr/watcher/adapter/linux/watch.hpp). The namespaces for our [adapters](https://github.com/e-dant/watcher/tree/release/devel/include/detail/wtr/watcher/adapter) are inline. When the (internal) `detail::...::watch()` function is [invoked](https://github.com/e-dant/watcher/blob/release/devel/include/wtr/watcher-/watch.hpp#L65), it resolves to one (and only one) platform-specifc implementation of the `watch()` function. One symbol, many platforms, where the platforms are inline namespaces.
Exceptions to Efficient Scanning Efficiency takes a hit when we bring out the `warthog`, our platform-independent adapter. This adapter is used on platforms that lack better alternatives, such as (not Darwin) BSD and Solaris (because `warthog` beats `kqueue`). *Watcher* is still relatively efficient when it has no alternative better than `warthog`. As a thumb-rule, scanning more than one-hundred-thousand paths with `warthog` might stutter. I'll keep my eyes open for better kernel APIs on BSD.
Ready State There is no reliable way to communicate when a watcher is ready to send events to the callback. For a few thousand paths, this may take a few milliseconds. For tens-of-thousands of paths, consider waiting a few seconds.
Unsupported events None of the platform-specific implementations provide information on what attributes were changed from. This makes supporting those events dependant on storing this information ourselves. Storing maps of paths to `stat` structures, diffing them on attribute changes, is a non-insignificant memory commitment. The owner and attribute events are unsupported because I'm not sure how to support those events efficienty.
Unsupported filesystems Special filesystems, including `/proc` and `/sys`, cannot be watched with `inotify`, `fanotify` or the `warthog`. Future work may involve dispatching ebpf programs for the kernel to use. This would allow us to monitor for `modify` events on some of those special filesystem.
Resource limitations The number of watched files is limited when `inotify` is used.

Relevant OS APIs Used

Linux - `inotify` - `fanotify` - `epoll` - `eventfd`
Darwin - `FSEvents` - `dispatch`
Windows - `ReadDirectoryChangesW` - `IoCompletionPort`

Minimum C++ Version

For the header-only library and the tiny-watcher, C++17 and up should be fine.

We might use C++20 coroutines someday.

Cache Efficiency

$ tool/gen-event/dir &
$ tool/gen-event/file &
$ valgrind --tool=cachegrind wtr.watcher ~ -s 30
I   refs:      797,368,564
I1  misses:          6,807
LLi misses:          2,799
I1  miss rate:        0.00%
LLi miss rate:        0.00%

D   refs:      338,544,669  (224,680,988 rd   + 113,863,681 wr)
D1  misses:         35,331  (     24,823 rd   +      10,508 wr)
LLd misses:         11,884  (      8,121 rd   +       3,763 wr)
D1  miss rate:         0.0% (        0.0%     +         0.0%  )

LLd miss rate:         0.0% (        0.0%     +         0.0%  )
LL refs:            42,138  (     31,630 rd   +      10,508 wr)
LL misses:          14,683  (     10,920 rd   +       3,763 wr)
LL miss rate:          0.0% (        0.0%     +         0.0%  )

Namespaces and the Directory Tree

Namespaces and symbols closely follow the directories in the devel/include folder. Inline namespaces are in directories with the - affix.

For example, wtr::watch is inside the file devel/include/wtr/watcher-/watch.hpp. The namespace watcher in wtr::watcher::watch is anonymous by this convention.

More in depth: the function ::detail::wtr::watcher::adapter::watch() is defined inside one (and only one!) of the files devel/include/detail/wtr/watcher/adapter/*/watch.hpp, where * is decided at compile-time (depending on the host's operating system).

All of the headers in devel/include are amalgamated into include/wtr/watcher.hpp and an include guard is added to the top. The include guard doesn't change with the release version. In the future, it might.

watcher
├── src
│  └── wtr
│     ├── watcher
│     │  └── main.cpp
│     └── tiny_watcher
│        └── main.cpp
├── out
├── include
│  └── wtr
│     └── watcher.hpp
└── devel
   ├── src
   │  └── wtr
   └── include
      ├── wtr
      │  ├── watcher.hpp
      │  └── watcher-
      │     ├── watch.hpp
      │     └── event.hpp
      └── detail
         └── wtr
            └── watcher
               ├── semabin.hpp
               └── adapter
                  ├── windows
                  │  └── watch.hpp
                  ├── warthog
                  │  └── watch.hpp
                  ├── linux
                  │  ├── watch.hpp
                  │  ├── sysres.hpp
                  │  ├── inotify
                  │  │  └── watch.hpp
                  │  └── fanotify
                  │     └── watch.hpp
                  └── darwin
                     └── watch.hpp

You can run tool/tree to view this tree locally.

Comparison with Similar Projects ```yml https://github.com/notify-rs/notify: lines of code: 2799 lines of tests: 475 lines of docs: 1071 implementation languages: rust interface languages: rust supported platforms: linux, windows, darwin, bsd kernel apis: inotify, readdirectorychanges, fsevents, kqueue non-blocking: yes dependencies: none tests: yes static analysis: yes (borrow checked, memory and concurrency safe language) https://github.com/e-dant/watcher: lines of code: 1579 lines of tests: 881 lines of docs: 1977 implementation languages: cpp interface languages: cpp, shells supported platforms: linux, darwin, windows, bsd kernel apis: inotify, fanotify, fsevents, readdirectorychanges non-blocking: yes dependencies: none tests: yes static analysis: yes https://github.com/facebook/watchman.git: lines of code: 37435 lines of tests: unknown lines of docs: unknown implementation languages: cpp, c interface languages: cpp, js, java, python, ruby, rust, shells supported platforms: linux, darwin, windows, maybe bsd kernel apis: inotify, fsevents, readdirectorychanges non-blocking: yes dependencies: none tests: yes (many) static analysis: yes (all available) https://github.com/p-ranav/fswatch: lines of code: 245 lines of tests: 19 lines of docs: 114 implementation languages: cpp interface languages: cpp, shells supported platforms: linux, darwin, windows, bsd kernel apis: inotify non-blocking: maybe dependencies: none tests: some static analysis: none https://github.com/tywkeene/go-fsevents: lines of code: 413 lines of tests: 401 lines of docs: 384 implementation languages: go interface languages: go supported platforms: linux kernel apis: inotify non-blocking: yes dependencies: yes tests: yes static analysis: none (gc language) https://github.com/radovskyb/watcher: lines of code: 552 lines of tests: 767 lines of docs: 399 implementation languages: go interface languages: go supported platforms: linux, darwin, windows kernel apis: none non-blocking: no dependencies: none tests: yes static analysis: none https://github.com/parcel-bundler/watcher: lines of code: 2862 lines of tests: 474 lines of docs: 791 implementation languages: cpp interface languages: js supported platforms: linux, darwin, windows, maybe bsd kernel apis: fsevents, inotify, readdirectorychanges non-blocking: yes dependencies: none tests: some (js bindings) static analysis: none (interpreted language) https://github.com/atom/watcher: lines of code: 7789 lines of tests: 1864 lines of docs: 1334 implementation languages: cpp interface languages: js supported platforms: linux, darwin, windows, maybe bsd kernel apis: inotify, fsevents, readdirectorychanges non-blocking: yes dependencies: none tests: some (js bindings) static analysis: none https://github.com/paulmillr/chokidar: lines of code: 1544 lines of tests: 1823 lines of docs: 1377 implementation languages: js interface languages: js supported platforms: linux, darwin, windows, bsd kernel apis: fsevents non-blocking: maybe dependencies: yes tests: yes (many) static analysis: none (interpreted language) https://github.com/Axosoft/nsfw: lines of code: 2536 lines of tests: 1085 lines of docs: 148 implementation languages: cpp interface languages: js supported platforms: linux, darwin, windows, maybe bsd kernel apis: fsevents non-blocking: maybe dependencies: yes (many) tests: yes (js bindings) static analysis: none https://github.com/canton7/SyncTrayzor: lines of code: 17102 lines of tests: 0 lines of docs: 2303 implementation languages: c# interface languages: c# supported platforms: windows kernel apis: unknown non-blocking: yes dependencies: unknown tests: none static analysis: none (managed language) https://github.com/g0t4/Rx-FileSystemWatcher: lines of code: 360 lines of tests: 0 lines of docs: 46 implementation languages: c# interface languages: c# supported platforms: windows kernel apis: unknown non-blocking: yes dependencies: unknown tests: yes static analysis: none (managed language) ```