How to make monitoring / rotating log handlers multiprocessing safe

jikquantopian commented 5 years ago

I've experimented with MultiProcessingSubscriber and MultiProcessingHandler and for whatever reason they don't always seem to work. For example, I had a huge amount of trouble getting them to work -- and ultimately didn't -- with running a Flask app under Werkzeug's run_simple with processes set to a >1 value.

It seems like much of what these classes are trying to accomplish -- though certainly not all of it -- is to make logging to monitoring or rotating file handlers multiprocessing safe, since if you have multiple processes checking if log files need to be reopened or rotated, there are all sorts of race conditions leading to log messages being lost.

There is, however, a solution. Using fcntl.flock, I've successfully built a rotating log handler which is -- as far as I can tell -- 100% resilient in a multiprocessing environment. Log messages are not lost, log files are not rotated prematurely, etc.

I present it below for your consideration. This isn't a finished product I'd expect to be merged into Logbook as-is, for a number of reasons, including:

I wrote it for a Python3 environment so I made no effort to ensure Python2/3 compatibility for the code.
I'm only using it in environments where fcntl.flock works, so I haven't tested what would happen in environments where it doesn't.
On other operating systems from Linux there are different ways to do file locking, so it would be nice if the code were more generic about figuring out what locking mechanisms are available and using one of them, rather than just supporting fcntl.flock.
This doesn't actually have to be its own handler -- the locking mechanism could be integrated into existing handlers, though whether that's appropriate is up for discussion.
I made no effort in my handler to support mode values other than a because that's all I'm using and other modes are complicated because you have to be careful about multiple processes truncating the log file and causing log entries to be lost.

Having said all that, I wanted to submit this here to ask whether it is something you'd be interested in merging if its deficiencies were addressed, and if so what approach you'd like to take, i.e., adding to existing handlers vs. having separate ones. If there's interest we could talk about what you think would be needed to make it release-ready and I could try to find time to hammer it into shape.

So here's the proof-of-concept code:

class FlockFileLock(object):
    def __init__(self, path):
        self.path = path
        self.file = None

    def acquire(self):
        """Returns True if file has been changed, False otherwise"""
        ret = False
        while True:
            if not self.file:
                self.file = open(self.path, 'a')
                self.fstatd = os.fstat(self.file.fileno())
                ret = True
            try:
                fcntl.flock(self.file, fcntl.LOCK_EX)
            except Exception:
                try:
                    self.file.close()
                except Exception:
                    pass
                self.file = None
                raise
            try:
                statd = os.stat(self.path)
            except FileNotFoundError:
                try:
                    self.file.close()
                except Exception:
                    pass
                self.file = None
                continue
            except Exception:
                try:
                    self.file.close()
                except Exception:
                    pass
                self.file = None
                raise
            if self.fstatd[stat.ST_DEV] == statd[stat.ST_DEV] and \
               self.fstatd[stat.ST_INO] == statd[stat.ST_INO]:
                return ret
            self.file.close()
            self.file = None

    def release(self):
        if not self.file:
            return
        fcntl.flock(self.file, fcntl.LOCK_UN)

class FlockRotatingFileHandler(logbook.FileHandler):
    def __init__(self, filename, encoding='utf-8', level=logbook.NOTSET,
                 format_string=None, delay=False, max_size=1024 * 1024,
                 backup_count=5, filter=None, bubble=False):
        self.max_size = max_size
        self.backup_count = backup_count
        super(FlockRotatingFileHandler, self).__init__(
            filename, encoding=encoding, level=level,
            format_string=format_string, delay=delay, filter=filter,
            bubble=bubble)
        self.flock = FlockFileLock(filename)

    def should_rollover(self):
        self.stream.seek(0, 2)
        return self.stream.tell() >= self.max_size

    def perform_rollover(self):
        self.stream.close()
        for x in range(self.backup_count - 1, 0, -1):
            src = '%s.%d' % (self._filename, x)
            dst = '%s.%d' % (self._filename, x + 1)
            try:
                os.rename(src, dst)
            except FileNotFoundError:
                pass
        os.rename(self._filename, self._filename + '.1')
        self._open('a')

    def emit(self, record):
        try:
            self.lock.acquire()
            changed = self.flock.acquire()
            if changed:
                self.flush()
                if self.stream:
                    try:
                        self.stream.close()
                    except Exception:
                        pass
                    self.stream = None
                self.ensure_stream_is_open()
            super(FlockRotatingFileHandler, self).emit(record)
            if self.should_rollover():
                self.perform_rollover()
        finally:
            try:
                self.lock.release()
            except Exception:
                pass
            self.flock.release()

vmalloc commented 5 years ago

I think the current implementation will be very hard to proof in multiprocessing scenarios. These cases should probably make use of better logging facilities (especially things like journald/rsyslog), which are better optimized for handling high log volume from multiple producers

jikquantopian commented 5 years ago

I think the current implementation will be very hard to proof in multiprocessing scenarios.

The person saying something can't be done should get out of the way of the person doing it.

The code above works. It just works. In the environment I'm using in, it works, 100%, every time, no matter how many processes are writing to the log file and no matter how fast or frequently they are doing it. It just works.

It will work on any Linux system writing to local log files. Full stop.

Sure, it might not be possible for it to work 100% in every environment. Who cares? If it is useful for a lot of people in a lot of environments, why not make it available to those people to use in those environments?

which are better optimized for handling high log volume from multiple producers

Sure, this is an option, but I honestly don't understand why what I'm suggesting above shouldn't also be an option for people whose requirements it satisfies.

I mean :man_shrugging: it's your call, now that I've got it written I can just cut and paste it into whatever projects I need to use it in so my problem is solved. I just don't understand why it wouldn't also be worthwhile to share that solution with others.

vmalloc commented 5 years ago

The person saying something can't be done should get out of the way of the person doing it.

I can't say I feel your tone was called for - I was just making an observation, not closing your issue or saying I won't have it any way. I think you could have been a little bit nicer with your response in this case.

I was saying it because there have been a lot of issues in the rotation/thread safety area and I know people aren't pleased with it. I can't recall exactly but I think there was another discussion (possibly via email) around multiprocessing support so my instinctive reaction was this. Having said that, given a PR with accompanying tests demonstrating the issues and showing they are solved, I'm more than open to such changes.

jikquantopian commented 5 years ago

I can't say I feel your tone was called for - I was just making an observation, not closing your issue or saying I won't have it any way. I think you could have been a little bit nicer with your response in this case.

Point taken. I apologize.

Having said that, given a PR with accompanying tests demonstrating the issues and showing they are solved, I'm more than open to such changes.

I'm not sure how I can include tests showing that the issues are solved if I don't know what they are. Are you able to be more specific than "I think there was another discussion (possibly via email) around multiprocessing support"?

I can address the issues I've enumerated above, but I can't address issues I don't know about.

Also, would your preference be to incorporate this functionality into the existing Rotating and Monitoring handlers (optional, enabled with a keyword arg) or creating new handlers?

vmalloc commented 5 years ago

Regarding the previous issues I was referring to the default rotating handler, not to your solution. In general, I think this could have a place as a built in handler in Logbook, but there are tradeoffs which I'm still not sure what to make of, for example the high cost of an flock for each record emitting (this has a very high overhead over NFS for example)

I haven't given it much thought yet, but perhaps this can fit as an optional flag to the regular file and rotating handlers, so that it could be an "opt in" thing? What do you think?

jikquantopian commented 5 years ago

I haven't given it much thought yet, but perhaps this can fit as an optional flag to the regular file and rotating handlers, so that it could be an "opt in" thing? What do you think?

I will look at that when I get a chance.

I wouldn't recommend using this over NFS. ;-) I think it's worth trying to add this to MonitoringFileHandler and RotatingFileHandler but not the others. I'm not sure it's feasible to add it to TimedRotatingFileHandler because when there's no way for multiple processes running one of those to know if one of the other processes has already rotated.

getlogbook / logbook

How to make monitoring / rotating log handlers multiprocessing safe #287