coralogix / python-coralogix-sdk

Coralogix Python SDK
https://coralogix.com/integrations/coralogix-python-integration/
Apache License 2.0
0 stars 2 forks source link

possible deadlock when using multiprocessing #17

Open yuvalshi0 opened 12 months ago

yuvalshi0 commented 12 months ago

Heyo, We are using the coralogix handler, version 2.0.5 So recently we implemented a feature which leverages multiprocessing in python, the feature raises a Pool of proceses. While in production we do not terminate in and restart the pool, if our tests we do that alot.

After implementing the feature, we say several cases where our processes will just hang transiently, upon investigation I was lucky to reproduce this bug locally, I saw there was a processes hanging ,using py-spy I looked at the dump of the process this find out:

The main process (waiting for the pool to die):

> py-spy dump --pid 330362

Thread 330362 (idle): "MainThread"
    poll (multiprocessing/popen_fork.py:27)
    wait (multiprocessing/popen_fork.py:43)
    join (multiprocessing/process.py:149)
    _terminate_pool (multiprocessing/pool.py:732)
    __call__ (multiprocessing/util.py:224)
    terminate (multiprocessing/pool.py:657)
    close (parallel.py:168)

The hanging process:

> py-spy dump --pid 331186

Thread 331186 (idle): "MainThread"
    send_request (coralogix/http.py:45)
    _send_bulk (coralogix/manager.py:239)
    flush (coralogix/manager.py:278)
    _handler (coralogix/manager.py:350)
    handler (coralogix/manager.py:341)
    ident (threading.py:1154)
    _shutdown (threading.py:1540)

It seems the process hangs in the coralogix http, looking at the code, it seems the specific line its hanging is cls._mutex.acquire(), therefore the hanging process is deadlocked with the main process. For now we have disabled coralogix in our CI

daidokoro commented 11 months ago

@yuvalshi0 , thanks for raising this issue.

I'm having some difficulty replicating the issue in testing. Would you be able to provide code snippet(s) of how the SDK being used?

yuvalshi0 commented 11 months ago

@yuvalshi0 , thanks for raising this issue.

I'm having some difficulty replicating the issue in testing. Would you be able to provide code snippet(s) of how the SDK being used?

Heyo @daidokoro, Here is a minimal reproducible example:

from multiprocessing import Pool
import logging
from coralogix.handlers import CoralogixLogger

CORALOGIX_PRIVATE_KEY = "<PRIVATE_KEY_HERE>"

handler = CoralogixLogger(
                private_key=CORALOGIX_PRIVATE_KEY,
                app_name="dabug",
                subsystem="Subsystem",
            )
logger = logging.Logger("dabug")
logger.addHandler(handler)

def some_func(i):
    logger.info(f"i is {i}")
    print(i)

def test_logger_issue():
    with Pool() as pool:
        pool.map(some_func, range(1000))

To run:

pytest <filename>

This causes pytest to hang, forever

Note that it might take a few runs to actually happen, I used pytest-repeat to run the test a few times in a loop until the deadlock happens:

pytest <filename> --count=1000

Let me know if you need anymore help