dsoprea / PyInotify

An efficient and elegant inotify (Linux filesystem activity monitor) library for Python. Python 2 and 3 compatible.
GNU General Public License v2.0
245 stars 73 forks source link

InotifyTree does not pick up file events if the directories created #110

Open oonisim opened 11 months ago

oonisim commented 11 months ago

Environment

Reproducing the issue

If create the directories after monitoring started, file events e.g CLOSE_WAIT do get picked up.

import os
import pathlib
from inotify.adapters import (
    InotifyTree
)
from inotify.constants import (
    IN_ALL_EVENTS,
)

def test():
    with tempfile.TemporaryDirectory() as directory:
        monitor: InotifyTree = InotifyTree(path=directory, mask=IN_ALL_EVENTS)
        events = monitor.event_gen(yield_nones=False)
        for file in [
            "20231112/0001.txt",
            "20231113/0002.txt",
        ]:
            path_to_file: str = os.path.join(directory, file)
            folder: str = str(pathlib.Path(path_to_file).parent)
            mkdir(folder)
            with open(file=path_to_file, mode="w", encoding='utf-8') as _file:
                _file.write("foobar")

        while True:
            _, event_names, folder, filename = next(events)
            print(f"{folder} {filename} {event_names}")
---
-------------------------------- live log call ---------------------------------
DEBUG    inotify.adapters:adapters.py:58 Inotify handle is (5).
DEBUG    inotify.adapters:adapters.py:343 Adding initial watches on tree: [/tmp/tmphtf2h_ts]
DEBUG    inotify.adapters:adapters.py:82 Adding watch: [/tmp/tmphtf2h_ts]
DEBUG    inotify.adapters:adapters.py:96 Added watch (1): [/tmp/tmphtf2h_ts]
DEBUG    inotify.adapters:adapters.py:228 Events received from epoll: ['IN_ACCESS']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_CREATE', 'IN_ISDIR']
DEBUG    inotify.adapters:adapters.py:295 A directory has been created. We're adding a watch on it (because we're being recursive): [/tmp/tmphtf2h_ts/20231112]
DEBUG    inotify.adapters:adapters.py:82 Adding watch: [/tmp/tmphtf2h_ts/20231112]
DEBUG    inotify.adapters:adapters.py:96 Added watch (2): [/tmp/tmphtf2h_ts/20231112]
/tmp/tmphtf2h_ts 20231112 ['IN_CREATE', 'IN_ISDIR']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_CREATE', 'IN_ISDIR']
DEBUG    inotify.adapters:adapters.py:295 A directory has been created. We're adding a watch on it (because we're being recursive): [/tmp/tmphtf2h_ts/20231113]
DEBUG    inotify.adapters:adapters.py:82 Adding watch: [/tmp/tmphtf2h_ts/20231113]
DEBUG    inotify.adapters:adapters.py:96 Added watch (3): [/tmp/tmphtf2h_ts/20231113]
/tmp/tmphtf2h_ts 20231113 ['IN_CREATE', 'IN_ISDIR']

If the directories created in-advance, file events e.g CLOSE_WAIT get picked up.

def test2():
    with tempfile.TemporaryDirectory() as directory:
        mkdir(os.path.join(directory, "20231112"))
        mkdir(os.path.join(directory, "20231113"))

        monitor: InotifyTree = InotifyTree(path=directory, mask=IN_ALL_EVENTS)
        events = monitor.event_gen(yield_nones=False)
        for file in [
            "20231112/0001.txt",
            "20231113/0002.txt",
        ]:
            path_to_file: str = os.path.join(directory, file)
            folder: str = str(pathlib.Path(path_to_file).parent)
            mkdir(folder)
            with open(file=path_to_file, mode="w", encoding='utf-8') as _file:
                _file.write("foobar")

        while True:
            _, event_names, folder, filename = next(events)
            print(f"{folder} {filename} {event_names}")

-------------------------------- live log call ---------------------------------
DEBUG    inotify.adapters:adapters.py:58 Inotify handle is (5).
DEBUG    inotify.adapters:adapters.py:343 Adding initial watches on tree: [/tmp/tmpdlzodbg_]
DEBUG    inotify.adapters:adapters.py:82 Adding watch: [/tmp/tmpdlzodbg_]
DEBUG    inotify.adapters:adapters.py:96 Added watch (1): [/tmp/tmpdlzodbg_]
DEBUG    inotify.adapters:adapters.py:82 Adding watch: [/tmp/tmpdlzodbg_/20231112]
DEBUG    inotify.adapters:adapters.py:96 Added watch (2): [/tmp/tmpdlzodbg_/20231112]
DEBUG    inotify.adapters:adapters.py:82 Adding watch: [/tmp/tmpdlzodbg_/20231113]
DEBUG    inotify.adapters:adapters.py:96 Added watch (3): [/tmp/tmpdlzodbg_/20231113]
DEBUG    inotify.adapters:adapters.py:228 Events received from epoll: ['IN_ACCESS']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_CREATE']
/tmp/tmpdlzodbg_/20231112 0001.txt ['IN_CREATE']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_OPEN']
/tmp/tmpdlzodbg_/20231112 0001.txt ['IN_OPEN']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_MODIFY']
/tmp/tmpdlzodbg_/20231112 0001.txt ['IN_MODIFY']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_CLOSE_WRITE']
/tmp/tmpdlzodbg_/20231112 0001.txt ['IN_CLOSE_WRITE']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_CREATE']
/tmp/tmpdlzodbg_/20231113 0002.txt ['IN_CREATE']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_OPEN']
/tmp/tmpdlzodbg_/20231113 0002.txt ['IN_OPEN']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_MODIFY']
/tmp/tmpdlzodbg_/20231113 0002.txt ['IN_MODIFY']
DEBUG    inotify.adapters:adapters.py:169 Events received in stream: ['IN_CLOSE_WRITE']
/tmp/tmpdlzodbg_/20231113 0002.txt ['IN_CLOSE_WRITE']
oonisim commented 11 months ago

This can be expected as https://github.com/dsoprea/PyInotify#notes. The sample code is creating the directory in the same single thread/CPU, then InotifyTree can miss the timing.

IMPORTANT:
Recursively monitoring paths is not a functionality provided by the kernel.
Rather, we artificially implement it. As directory-created events are received,
we create watches for the child directories on-the-fly.

This means that there is potential for a race condition:
if a directory is created and a file or directory is created inside before you
(using the event_gen() loop) have a chance to observe it, then you are going to
have a problem: If it is a file, then you will miss the events related to its
creation, but, if it is a directory, then not only will you miss those creation
events but this library will also miss them and not be able to add a watch for them.
rpl-ian-lunam commented 4 months ago

This is also a problem if the directory is created with a mkdir -p which also create parent directories. Only the first new parent in the tree is noticed.

ie: if /a/b is being watched and mkdir -p /a/b/c/d/e is run only the contents of /a/b/c will be added to the watch. Any subsequent events in /a/b/c/d or /a/b/c/d/e will not be captured.

This should be an easy fix. When a dir is added, check if it has subdirs and add them.

rpl-ian-lunam commented 4 months ago

I'm not good enough with python to provide a PR, but it seems to me that the add-watch should be calling the __load_trees but that's in another class.