getsentry / raven-python

Raven is the legacy Python client for Sentry (getsentry.com) — replaced by sentry-python
https://sentry.io
BSD 3-Clause "New" or "Revised" License
1.68k stars 657 forks source link

Is raven thread and process safe? #806

Open jinzhao1994 opened 8 years ago

jinzhao1994 commented 8 years ago

When I test it with multi process, my server does not receive any message. I try to find the reason. It seems that the message go to the Queue, but cannot be picked out. In fact, although I config handler for each process, they use the same handler and the same transport object. The worker object block at Queue.get method.

I config the handler like this in each process.

logger = logging.getLogger()
logger.handlers = []
client = Client(os.getenv('SENTRY_DSN'))
handler = SentryHandler(client)
handler.setLevel(logging.ERROR)
logger.addHandler(handler)

The worker object block at this statement. File: raven/transport/threaded.py Line 133, record = self._queue.get()

    def _target(self):
        while True:
            record = self._queue.get()  # This statement blocks.
            try:
                if record is self._terminator:
                    break
                callback, args, kwargs = record
                try:
                    callback(*args, **kwargs)
                except Exception:
                    logger.error('Failed processing job', exc_info=True)
            finally:
                self._queue.task_done()

            sleep(0)

Only the main process send the message successful. Btw, because, my raven version is 5.2.0.

jinzhao1994 commented 8 years ago

I have find a way to fix it. If I don't print any log before I start multiprocess, it will run well. But I still don't know why. The MultiProcessRotatingFileHandler always runs well.

jinzhao1994 commented 8 years ago

The _registry of Client is a static variable. I move the statement

_registry = TransportRegistry(transports=default_transports)

to __init__ method and add self..

There is no problem after I do like this. But I'm still confused why it can run well if I don't call logging until I start the child process.

jinzhao1994 commented 8 years ago

If I try to send some logs in main process, it will start the working thread, which consume the queue. Then I start some child process. Child process will copy the memory as main process, so child process also have the queue variable. But in fact, child process does not have a consume thread. So only the main process can send logs to sentry.

If I don't send log until I start child process, the consume thread and queue will not be made. All the child process will copy the memory without the queue variable, so they will build their own variable. That's why it works.

I'm confused about why we make _registry as a attribute of Client class, but client object. If we use it as a object attribute, and config for each process, it will run well.

mitsuhiko commented 8 years ago

Can you share which version of Python, which interpreter and which OS you are on?

jinzhao1994 commented 8 years ago

Here is my OS version.

Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt25-2+deb8u2 (2016-06-25)

Here is my Python version.

Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
[GCC 4.9.2] on linux2
mitsuhiko commented 8 years ago

Are you spawning multiple processes with os.fork? If yes, are you creating the client after or before fork?

I just want to make sure I understand the entire situation here.

jinzhao1994 commented 8 years ago

Here is my test code.

#!/usr/local/bin/python
# coding: utf-8

from raven import Client
from raven.handlers.logging import SentryHandler
from raven.conf import setup_logging
import multiprocessing
import thread
import logging
import time
import os

def init_log():
    root_logger = logging.getLogger()
    root_logger.handlers = []
    client = Client(os.getenv('SENTRY_DSN'))
    handler = SentryHandler(client)
    handler.setLevel(logging.ERROR)
    root_logger.addHandler(handler)

def work():
    while True:
        time.sleep(1)
        logging.error('child thread log')

def run():
    init_log()
    for i in xrange(1):
        thread.start_new_thread(work, ()) 
    while True:
        time.sleep(1)
        logging.error('process ')

def main():
    init_log()
    for i in xrange(2):
        p = multiprocessing.Process(target=run, args=())
        p.start()
    logging.error('test start')
    while True:
        time.sleep(1)
        logging.error('main process')

if __name__ == '__main__':
    main()

This code will run well, but if I move the statement logging.error('test start') like this,

def main():
    init_log()
    logging.error('test start')
    for i in xrange(2):
        p = multiprocessing.Process(target=run, args=())
        p.start()
    while True:
        time.sleep(1)
        logging.error('main process')

My child process cannot send log to sentry.

jinzhao1994 commented 8 years ago

The reason I find is that, raven will build a thread to consume the log, which will be start after first log print statement. The code is in raven/transport/threaded.py When I first print log in main process, it runs the get_worker method in ThreadHTTPTransport, and construct a AsyncWorker object. This object will start the consume thread when it is constructed. Then I start my child process. Child process will keep the memory as main process, so the AsyncWorker object still exist. But the construct method hasn't been called, and there is no thread consume the log message queue.

jinzhao1994 commented 8 years ago

Although I build a new Client object after I start child process, the ThreadHTTPTransport object is stored in the class attribute _registry of class Client, so the ThreadHTTPTransport is copied from main process, but not constructed in child process.

mitsuhiko commented 8 years ago

I will look into this. Unfortunately mixing threads with fork is dangerous territory and totally not supported. I wonder if we can do something there that makes sense.

jinzhao1994 commented 8 years ago

I moved the attribute _registry to object attribute, then it seems that it can work well. I've thought another way, maybe more pretty.

  1. Store the pid when the Raven object is built.
  2. Force construct a new Raven object and all class attribute when the pid is change.
jinzhao1994 commented 8 years ago

Here is my change. I forked the repository and make a commit. My change

I've just done some simple test for my change. It is just a suggestion.

mitsuhiko commented 8 years ago

Something along those lines is most likely what we will do. However this is a huge can of works for many reasons so I want to see first why we end up in this situation at all. An argument could be made that this is something the transports are supposed to handle.

nateyoder commented 7 years ago

@jinzhao1994 sorry to resurrect an old thread but how did you end up working around this issue? It seems as though you have deleted your fork?

jinzhao1994 commented 7 years ago

Sorry... I forgot there is a link... I thought they will work with it... In my code, I just write a init log function. This function will be called first in the new process.

def init_log(sentry_dsn, sentry_level=logging.ERROR):
    import raven
    from raven.handlers.logging import SentryHandler
    from raven.transport.registry import TransportRegistry, default_transports
    raven.Raven = None
    raven.Client.logger = logging.getLogger('raven')
    raven.Client._registry = TransportRegistry(transports=default_transports)
    client = raven.Client(sentry_dsn)
    handler = SentryHandler(client)
    handler.setLevel(sentry_level)
    logger.addHandler(handler)

Or you can simply don't use log before fork process.

nateyoder commented 7 years ago

@jinzhao1994 Thank you so much for the help! Really appreciate it!