Ignores IN_Q_OVERFLOW events

kimyx commented 7 years ago

We are using this PyInotify on large directory trees holding millions of files with the possibility of large batches of file additions or deletions at any time. By keeping the kernel-level queue size large enough we generally avoid IN_Q_OVERFLOW events from the kernel. However, we've been surprised a few times and in those cases we lose file events with no warning.

It would be good to add explicit handling of the IN_Q_OVERFLOW event. In 0.2.7 if the kernel generates IN_Q_OVERFLOW, I believe that __handle_inotify_event immediately returns, event_gen's loop terminates, and then yields a final None event without explanation.

I haven't tried 0.2.8 yet, but it appears that the IN_Q_OVERFLOW event is simply ignored and operation continues. Assuming this is the case, this is a step backward from the standpoint of IN_Q_OVERFLOW.

I can work on this, but would like to get advice on what to do when it occurs. Options:

yield an event with no path, wd=-1, and the IN_Q_OVERFLOW mask
raise an exception

What do you think?

Thanks, Kim

dsoprea commented 7 years ago

@kimyx I'm leaning toward preferring a callback be defined to catch them. When it occurs, it's more of a fault raised in the layer than a tangible thing to be enumerated alongside file events. Do you have argument for one versus the other?

kimyx commented 7 years ago

We used to use the old pyinotify library and it returned queue overflow like a normal event. That fit naturally into our event handlers.

But I know what you mean about this being a different category of event. We must treat this category as catastrophic, because we have to take special steps to recover when regular file events are missed. So I'd be happy to have it raise a unique exception that we can catch and handle. If you mean an actual callback function, that would generate extra complexity in pyinotify that isn't needed in our situation. It seems like lost events would be catastrophic for any user of inotify, but I'm not sure.

dsoprea commented 7 years ago

That wasn't meant to imply that they're not catastrophic, just that they're out-of-band from normal events. I was also not seeing a strong benefit from exceptions since this is a generator. Can I assume you have the whole loop, or function that calls said loop, within a try-catch, which you then restart when you encounter an exception?

I'm not hugely partial to one versus the other.

While we're dealing with this, let's account for other event types that might require special handling (by either us or the user). You won't necessarily be the one who does it (though you can work on your piece separately, if you'd like), but it'll have to be done before we merge back to master if I/we can do it quickly. These all represent risk that needs to be manageable.

IN_IGNORED
IN_UNMOUNT

If you set the buffer to be sufficiently large, shouldn't you, theoretically, not receive the overflows?

Dustin

On Thu, Oct 20, 2016 at 11:10 AM, kimyx notifications@github.com wrote:

We used to use the old pyinotify library and it returned queue overflow like a normal event. That fit naturally into our event handlers.

But I know what you mean about this being a different category of event. We must treat this category as catastrophic, because we have to take special steps to recover when regular file events are missed. So I'd be happy to have it raise a unique exception that we can catch and handle. If you mean an actual callback function, that would generate extra complexity in pyinotify that isn't needed in our situation. It seems like lost events would be catastrophic for any user of inotify, but I'm not sure.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dsoprea/PyInotify/issues/23#issuecomment-255134569, or mute the thread https://github.com/notifications/unsubscribe-auth/AArraru1EKfUjAIEuxlthJ240yrsHh0aks5q14RLgaJpZM4KcNRw .

kimyx commented 7 years ago

When an overflow happens, we have to run a full filesystem scan in a separate process to see what changes we missed. We also have to restart the pyinotify-based processes because we're not sure if we missed watch changes. So in our case we just want to terminate all the pyinotify-based python. And then start it fresh and run cleanup processes.

Thanks for mentioning the other special event types, that does make a good case for a callback.

Yes, if we make the inotify buffer large enough, everything is fine until the buffer is no longer large enough. Our filesystems are growing continuously and we've already been bit by overflow. There must be some fundamental limit based on how fast changes can reach the OS, but I'm not sure if we've found it. Our current limit is fs.inotify.max_queued_events=3000000 after we overflowed 1,000,000.

It sounds like it would be better for you to decide exactly how to implement the general solution. It's not urgent; we have the buffer set big enough for now and I can just patch our local copy with simpler protection if it will take you an extended time. If you want to outline the desired approach in more detail I'd be happy to help.

Thanks, Kim

kimyx commented 7 years ago

My thoughts on getting started. We should add an optional argument to event_gen:

    def event_gen(self, special_callback=None)

where the callback function looks like this:

def special_callback(inotify_obj, header, type_names, path, filename)

This would be called (if not None) whenever the IN_Q_OVERFLOW, IN_IGNORED, or IN_UNMOUNT event masks are received. In some cases, path and filename might be empty.

If the callback returned 0, the __handle_inotify_event loop would break; otherwise, the callback would have no direct effect and the event loop would continue.

kimyx commented 7 years ago

BTW, I accidentally found today that at least pyinotify 0.2.7 returns the IN_IGNORED and IN_UNMOUNT events as part of the normal event stream.

dsoprea / PyInotify

Ignores IN_Q_OVERFLOW events #23