Open jinzhao1994 opened 8 years ago
I have find a way to fix it. If I don't print any log before I start multiprocess, it will run well. But I still don't know why. The MultiProcessRotatingFileHandler always runs well.
The _registry
of Client
is a static variable. I move the statement
_registry = TransportRegistry(transports=default_transports)
to __init__
method and add self.
.
There is no problem after I do like this. But I'm still confused why it can run well if I don't call logging until I start the child process.
If I try to send some logs in main process, it will start the working thread, which consume the queue. Then I start some child process. Child process will copy the memory as main process, so child process also have the queue variable. But in fact, child process does not have a consume thread. So only the main process can send logs to sentry.
If I don't send log until I start child process, the consume thread and queue will not be made. All the child process will copy the memory without the queue variable, so they will build their own variable. That's why it works.
I'm confused about why we make _registry
as a attribute of Client class, but client object. If we use it as a object attribute, and config for each process, it will run well.
Can you share which version of Python, which interpreter and which OS you are on?
Here is my OS version.
Linux version 3.16.0-4-amd64 (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt25-2+deb8u2 (2016-06-25)
Here is my Python version.
Python 2.7.9 (default, Mar 1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Are you spawning multiple processes with os.fork
? If yes, are you creating the client after or before fork?
I just want to make sure I understand the entire situation here.
Here is my test code.
#!/usr/local/bin/python
# coding: utf-8
from raven import Client
from raven.handlers.logging import SentryHandler
from raven.conf import setup_logging
import multiprocessing
import thread
import logging
import time
import os
def init_log():
root_logger = logging.getLogger()
root_logger.handlers = []
client = Client(os.getenv('SENTRY_DSN'))
handler = SentryHandler(client)
handler.setLevel(logging.ERROR)
root_logger.addHandler(handler)
def work():
while True:
time.sleep(1)
logging.error('child thread log')
def run():
init_log()
for i in xrange(1):
thread.start_new_thread(work, ())
while True:
time.sleep(1)
logging.error('process ')
def main():
init_log()
for i in xrange(2):
p = multiprocessing.Process(target=run, args=())
p.start()
logging.error('test start')
while True:
time.sleep(1)
logging.error('main process')
if __name__ == '__main__':
main()
This code will run well, but if I move the statement logging.error('test start')
like this,
def main():
init_log()
logging.error('test start')
for i in xrange(2):
p = multiprocessing.Process(target=run, args=())
p.start()
while True:
time.sleep(1)
logging.error('main process')
My child process cannot send log to sentry.
The reason I find is that, raven will build a thread to consume the log, which will be start after first log print statement. The code is in raven/transport/threaded.py
When I first print log in main process, it runs the get_worker
method in ThreadHTTPTransport
, and construct a AsyncWorker
object. This object will start the consume thread when it is constructed. Then I start my child process. Child process will keep the memory as main process, so the AsyncWorker
object still exist. But the construct method hasn't been called, and there is no thread consume the log message queue.
Although I build a new Client
object after I start child process, the ThreadHTTPTransport
object is stored in the class attribute _registry
of class Client
, so the ThreadHTTPTransport
is copied from main process, but not constructed in child process.
I will look into this. Unfortunately mixing threads with fork is dangerous territory and totally not supported. I wonder if we can do something there that makes sense.
I moved the attribute _registry
to object attribute, then it seems that it can work well.
I've thought another way, maybe more pretty.
Here is my change. I forked the repository and make a commit. My change
I've just done some simple test for my change. It is just a suggestion.
Something along those lines is most likely what we will do. However this is a huge can of works for many reasons so I want to see first why we end up in this situation at all. An argument could be made that this is something the transports are supposed to handle.
@jinzhao1994 sorry to resurrect an old thread but how did you end up working around this issue? It seems as though you have deleted your fork?
Sorry... I forgot there is a link... I thought they will work with it... In my code, I just write a init log function. This function will be called first in the new process.
def init_log(sentry_dsn, sentry_level=logging.ERROR):
import raven
from raven.handlers.logging import SentryHandler
from raven.transport.registry import TransportRegistry, default_transports
raven.Raven = None
raven.Client.logger = logging.getLogger('raven')
raven.Client._registry = TransportRegistry(transports=default_transports)
client = raven.Client(sentry_dsn)
handler = SentryHandler(client)
handler.setLevel(sentry_level)
logger.addHandler(handler)
Or you can simply don't use log before fork process.
@jinzhao1994 Thank you so much for the help! Really appreciate it!
When I test it with multi process, my server does not receive any message. I try to find the reason. It seems that the message go to the Queue, but cannot be picked out. In fact, although I config handler for each process, they use the same handler and the same transport object. The worker object block at
Queue.get
method.I config the handler like this in each process.
The worker object block at this statement. File: raven/transport/threaded.py Line 133, record = self._queue.get()
Only the main process send the message successful. Btw, because, my raven version is 5.2.0.