dsoprea / PyInotify

An efficient and elegant inotify (Linux filesystem activity monitor) library for Python. Python 2 and 3 compatible.
GNU General Public License v2.0
245 stars 73 forks source link

Use of pathname instead of watch descriptor is problematic #16

Closed foogod closed 6 years ago

foogod commented 8 years ago

Using the path as the handle to refer to watches causes issues in some situations, because the pathname the watch was created against is not guaranteed to be unique/distinct for the life of the watch.

For example, take the following scenario:

  1. Create a watch against /dir/foo
  2. Somebody does mv /dir/foo /dir/bar (at this point, the watch created in step 1 is still active, still valid, and still watching the file which used to be /dir/foo, even though it's name has now changed)
  3. Somebody creates a new /dir/foo file.
  4. The application wants to keep watching the old file, but also open a new watch on the new /dir/foo file.

In this situation, the inotify library gets hopelessly confused (and even if it didn't, it doesn't return adequate information from watch events for the app to be able to figure out which watch is indicating what).

This is actually causing problems for me as I'm trying to write an application to monitor log files (which regularly get rotated in this way).

My recommendation would be that Inotify.add_watch should create a unique "Watch object" for each active watch, which it returns to the caller, and would then be provided as part of watch events (and could be supplied as an argument to remove_watch, etc.) to be able to uniquely distinguish between watches, even if multiple ones have the same associated path.

dsoprea commented 8 years ago

It seems like you have a handle on how it works. A PR might get you a solution quicker. Otherwise, it might take me a few days to get to this.

dsoprea commented 8 years ago

Follow #19 . He coincidently reported the same issue and will be submitting a PR for this imminently.

dsoprea commented 8 years ago

Taking a more careful look at this, the project doesn't really experience confusion though we'd be tracking a directory that we'll no longer be receiving events for and won't receive events for the new directory. This is deterministic and consistent behavior.

I'm not clear on how this affects you, though. If you're storing the paths, then you just need to update your record of them in response to a move/rename event. If there is some watch object or handle, it'll just suddenly be bad after a move/rename (same difference with a path for a watch that is no longer recognized). It's the same problem, on your end, if you're ignoring the renames, either way.

foogod commented 8 years ago

There are a couple of problems: The big one is that the watch events that the python inotify code returns for the (renamed) file still all report the old filename, not the new one, so that events from both files are both identified with the same tag/filename (so it's not possible to tell which is which).

Even if this weren't the case, tracking the filename changes can be tricky or impossible (for example, if the file is actually deleted, it has no new filename, but may still continue to receive writes to it from programs which still have it open). Likewise, if it's moved/renamed to somewhere outside of the watched directory, I will never receive notification of what it's new filename is (not as likely with my particular application, but still potentially possible, and potentially a problem for other use cases).

The long and the short of it is that the kernel inotify function uses distinct (numerical) handles to refer to watches for a reason, and that reason is because filenames are not a reliable way to track files/watches under a variety of possible (and even, in some cases, likely) scenarios.

FWIW, #19 seems to be similar in some ways but is not the same issue as I'm encountering. I am not using InotifyTree at all, I'm just trying to watch a few specific files within a single directory, but those files may change (or lose altogether) their filenames during the watching, and other new files (which I also want to watch) may then have exactly the same filename as the original file had, and the current python inotify behavior makes that pretty much impossible to distinguish between or account for.

(Sorry about the delay in responding.. the project that I encountered this issue with has sorta been sidelined for a while with a bunch of more important stuff taking priority, so this isn't a huge issue for me at the moment, but will eventually likely come back up when I get a chance to get back to it.. If it's still an issue at the point I have some time to work on it I'll see if I can put together a PR for you.. Until then I figured it was still good to have some record of the issue/problem..)