IanHarvey / bluepy

Python interface to Bluetooth LE on Linux
Other
1.6k stars 490 forks source link

bluepy deadlocks during disconnect (with supporting info and solution). #344

Open dsmith9000 opened 5 years ago

dsmith9000 commented 5 years ago

(See Below)

dsmith9000 commented 5 years ago

Test System: Raspberry Pi 3B+ with Raspbian Stretch 9.6 Problem: After running for various periods of time (hours), I notice that bluepy will occasionally appear to freeze. I was able to get a stacktrace at one of these instances (pasted below) and it seems to indicate that it is happening during attempt to disconnect. Specifically, in btle.py in the _stopHelper method at the line self._helper.wait():

    def _stopHelper(self):
        if self._helper is not None:
            DBG("Stopping ", helperExe)
            self._poller.unregister(self._helper.stdout)
            self._helper.stdin.write("quit\n")
            self._helper.stdin.flush()
            self._helper.wait()
            self._helper = None
        if self._stderr is not None:
            self._stderr.close()
            self._stderr = None

This is a call to a subprocess.Popen method. The documentation of subprocess seems to indicate that a deadlock can occur when calling this function when the subprocess was opened with stdout=PIPE or stderr=PIPE. Looking in the _startHelper method, the stdout is indeed set to PIPE. The docs indicate that wait can now be called with a timeout to cause an exception if it deadlocks. See supporting info below:

Stacktrace:

^CException ignored in: <bound method Peripheral.__del__ of <bluepy.btle.Peripheral object at 0x75c5af50>>
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/bluepy/btle.py", line 630, in __del__
    self.disconnect()
  File "/usr/local/lib/python3.5/dist-packages/bluepy/btle.py", line 455, in disconnect
    self._stopHelper()
  File "/usr/local/lib/python3.5/dist-packages/bluepy/btle.py", line 294, in _stopHelper
    self._helper.wait()
  File "/usr/lib/python3.5/subprocess.py", line 1389, in wait
    (pid, sts) = self._try_wait(0)
  File "/usr/lib/python3.5/subprocess.py", line 1339, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)

Documentation of Popen.wait() from: https://docs.python.org/3/library/subprocess.html Popen.wait(timeout=None) Wait for child process to terminate. Set and return returncode attribute. If the process does not terminate after timeout seconds, raise a TimeoutExpired exception. It is safe to catch this exception and retry the wait. Note: This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use Popen.communicate() when using pipes to avoid that.

Note:
The function is implemented using a busy loop (non-blocking call and short sleeps). Use the asyncio module for an asynchronous wait: see asyncio.create_subprocess_exec.
Changed in version 3.3: timeout was added.

Potential bluepy solutions:

  • Call wait with a timeout and let the exception timeout break the deadlock
  • Use Popen.communicate instead (as suggested by the docs)
vladisnik commented 5 years ago

I confirm, ran into when bluetooth host device gone to shutdown. Bluepy-helper process goes to state D.

jayneev commented 2 years ago

I am also facing this deadlock issue leading to Broken Pipe. Can this issue root cause analysis and corresponding fixes be considered as critical or high priority?

dsmith9000 commented 2 years ago

I am also facing this deadlock issue leading to Broken Pipe. Can this issue root cause analysis and corresponding fixes be considered as critical or high priority?

@jayneev I've developed and used the following monkey-patched workaround until the issue is patched in the trunk. In my testing, it has completely resolved this issue for me. You can simply copy the 'patched_stopHelper' function into your code and then monkey-patch it into the bluepy code as below.

Note that I cut and pasted the code below from my project, so there may be some other imports that are required.


# Import bluepy module
import bluepy.btle as bluepy
import subprocess

if True:
    ##############################################################################
    # Patched workaround routine
    ##############################################################################
    def patched_stopHelper(self):
        """
        PATCH THE EXISTING ROUTINE TO FIX POTENTIAL DEADLOCK THAT OCCURS WHEN .wait CALLED WITH stdout/stderr = PIPE.
        Path the existing routine to fix a potential deadlock that occurs when .wait was called with no timeout and
        with stdout/stderr=subprocess.PIPE.
        See docs for .wait at: https://docs.python.org/3/library/subprocess.html for a discussion of the issue.
        :return:
        """
        if self._helper is not None:
            bluepy.DBG("Stopping ", bluepy.helperExe)
            self._poller.unregister(self._helper.stdout)
            try:
                self._helper.stdin.write("quit\n")
                self._helper.stdin.flush()
            except Exception as ex:
                logger.error("Exception: {} - in patched 'BluepyHelper._stopHelper'.".format(ex))

            # ORIGINAL LINE
            # self._helper.wait()

            # BEGIN MODIFIED SECTION
            try:
                # Set a 10 second timeout to catch deadlocks
                self._helper.wait(timeout=10)

            except (subprocess.TimeoutExpired, Exception) as ex:
                msg = "Exception: {} - DEADLOCK OCCURRED AND TIMED OUT! Killing child process.".format(ex)
                logger.error(msg)
                # Kill the child process
                try:
                    self._helper.kill()
                    self._helper.communicate(timeout=5)     # Per docs: read after kill to clear pipes which allows process to die
                    if self._helper.poll() is None:
                        logger.error("KILL RESULT: Process {} doesn't report self as killed!".format(self._helper.pid))
                except Exception as ex2:
                    logger.error("Exception: {} - OCCURRED DURING ATTEMPT TO KILL CHILD PROCESS.".format(ex2))

            # END MODIFIED SECTION

            self._helper = None
        if self._stderr is not None:
            self._stderr.close()
            self._stderr = None

    # MONKEY-PATCH 'BluepyHelper._stopHelper' to replace it with the 'patched_stopHelper' routine.
    # Since this modifies the class, it automatically updates all of the existing objects (and any new ones) that were
    # created from the class (like how class variables are shared by all objects of the class).
    bluepy.BluepyHelper._stopHelper = patched_stopHelper
jayneev commented 2 years ago

Thanks a lot @programmer7010 for such a good quick help, I will try this fix and update on test results soon.