dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

API server-sent events: recursively monitoring of directories #5095

Open onnozweers opened 4 years ago

onnozweers commented 4 years ago

Dear dCache devs,

We're looking into server-sent events through the API. We can see several use cases. However, the first question we expect from our users is: can we monitor directories recursively? I could not find an answer in the documentation at https://github.com/dCache/dcache/blob/master/docs/UserGuide/src/main/markdown/frontend.md#storage-events. I tried events through the dCache View /events.html playground page, and I noticed that events inside subdirectories were not reported.

Kind regards, Onno

paulmillar commented 4 years ago

Hi Onno,

Yes, that is a reasonable question: I'm sorry the docs didn't make this clear.

The SSE "inotify" events are heavily based on the inotify(7) interface that Linux provides. This was to make it easier for anyone who is familiar with Linux's inotify system. So, you can learn a lot about the semantics from man inotify or searching the internet for inotify.

To give you some direct answers: no: inotify doesn't (directly) support recursive notification, but it can still be done.

Background: inotify works by marking a particular inode as being of interest. Activity that targets that inode (and that matches the filter, if any) triggers an event.

So, for example, moving a file from one directory to other involves both the source and destination directory, as well as the file itself. Any watches on the source directory, destination directory and move target will receive an event.

Creating a file only affects the new file's parent directory. It doesn't affect that directory's parent directory.

(Aside, this also explains the semantics if a "watched" directory is moved. Since a directory's inode does not change when the directory is moved, applications will continue to receive events after the directory is moved.)

So, the inotify API (both as implemented in Linux and in dCache) does not support watching recursively for events.

However, it is possible to list the target directory and add extra watches. Any sub-directories may be listed, too, doing this recursively. Additionally, the application can also watch for additional directories being created and add watches if that happens.

Although this sounds complicated, it isn't too bad. I wrote some example python code to demonstrate how recursive directory listing might be implemented in python:

https://github.com/paulmillar/dcache-sse

The code probably should be rewritten as a library so others can reuse it, but it hopefully shows that this is possible.

Cheers, Paul.

onnozweers commented 4 years ago

Hi Paul,

Thanks for clarifying this.

Cheers, Onno