bit-team / backintime

Back In Time - An easy-to-use backup tool for GNU Linux using rsync in the back
https://backintime.readthedocs.io
GNU General Public License v2.0
1.92k stars 180 forks source link

[Unit tests] Popen() warning: subprocess is still running (non-deterministic). #1292

Open aryoda opened 1 year ago

aryoda commented 1 year ago

Symptoms

When I execute unit tests with common/make test-v I get sometimes warnings like this (non-deterministically):

/usr/lib64/python3.6/subprocess.py:786: ResourceWarning: subprocess 17750 is still running

Expected behavior

Analysis

Looking into the code locations I can see that subprocesses are mostly spawned with subprocess.Popen() + communicate() code pattern to perform async subprocess calls:

            proc = subprocess.Popen(ssh,
                                    stdout = subprocess.PIPE,
                                    stderr = subprocess.PIPE,
                                    universal_newlines = True)
            out, err = proc.communicate()

communicate() will

In case of errors to communicate() with the subprocess it is not killed.

Possible solutions

communicate with proc.kill after timeout

There is a documented best-practice code snippet for this in the documentation:

proc = subprocess.Popen(...)
try:
    outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()

with to add context management (object lifecycle)

Alternatively this code pattern could be used

with Popen(["ifconfig"], stdout=PIPE) as proc:
    # do something, eg. communicate()

https://docs.python.org/3/library/subprocess.html#subprocess.Popen

Popen objects are supported as context managers via the with statement: on exit, standard file descriptors are closed, and the process is waited for.

Impact

Using with requires context manager support from Popen() and this supported only for Python version 3.2 or higher.

Changed in version 3.2: Added context manager support.

Using with means that BiP will no longer work for older Python versions (before 3.2).

Others

Check if every code location with Popen() does use communicate() too

buhtz commented 1 year ago

Thanks for analyzing this. I saw that problem, too but couldn't figure out what it is. ;) But you have to help me a bit. I still don't understand why this problem appear in our testing code.

I vote for with solution. We don't support 3.2. Based on the current Travis configuration we support Python 3.6 or higher. This is enough even for very old Debian. (I am a Debian loving Debian users.) ;)

aryoda commented 1 year ago

I still don't understand why this problem appear in our testing code.

A unit tests uses the "object under test" ("OuT", eg. a class or even a single function) only for a short time.

I the unit tests calls an OuT function with Popen() inside the unit test may finish before the spawned sub process does end because it may take a few "millisecond" until it ends/shuts down. This is non-deterministic (eg. the CPU scheduler decides who gets when and how much CPU time to execute code).

Note that Popen works async (it returns before the sub process ends) so that the Python code can continue "in parallel".

Edit: So it may never be a problem in BiT itself but only during unit testing. But since unit tests shall succeed and no (unexpected) warning shall occur it is a problem eg. for TravisCI because a failing test may be a "false positive".

buhtz commented 1 year ago

Ah thanks. I got it now. ;)

Then maybe such processed should be killed in a unittest.TestCase.TearDown() (or TearDownClass()) method. I need to take a closer look. But what is in my mind is that each test_*() add the process to a member (list) of the test class. And at the end (in the TearDown() method) it iterates on that list and check if that processes should be killed or not. This would prevent us from using a timeout.

aryoda commented 1 year ago

IMHO the running subprocess may occur also in BiT (but very rarely and only in case of errors).

I will fix this in the BiT code directly, this is the cleanest solution I think (and not much work).

emtiu commented 1 year ago

Given the choice between the two solutions above, I would also prefer with, because it's much cleaner and (as far as I understand it) exactly fit for this purpose.

aryoda commented 1 year ago

Before changing the Popen calls I have checked the code coverage report to see which Popen calls are NOT covered by any unit test.

Result:

About 50 % of the Popen calls are covered by at least one unit test, esp. backup- and ssh-related calls.

The not covered calls are mostly not central to backup functionality.

Overall risk estimation of this change: low

Reasons:

Details: The following methods are currently not covered by a unit test:


qt/serviceHelper.py#111:   def _which(self, exe, fallback): 
common/encfstools.py#80:   def _mount(self):
common/encfstools.py#154:  def checkVersion(self): 
common/encfstools.py#362:  class Encode: def startProcess(self): 
common/encfstools.py#585:  class Decode: def startProcess(self): 
common/snapshots.py#1562:  def statFreeSpaceSsh(self): 
common/sshMaxArg.py#36:    def maxArgLength(config, mid = 1048320): 
common/sshtools.py#1058:   def sshCopyId(pubkey, user, host, port='22', askPass='backintime-askpass', cipher=None):
common/tools.py#791:       dev checkHomeEncrypt()
common/tools.py#1845:      dev shutdown(self)
common/tools.py#1862:      dev unity7(self)
qt/app#1018.py:            def openManPage(self, man_page):
qt/plugins/notifyplugin.py#61:             def message(self, profile_id, profile_name, level, message, timeout):
qt/plugins/qt4plugin.py#55 class QtPlugin: def processBegin(self):
qt/qtsystrayicon.py#203:   def onStartBIT(self):
qt/snapshotsdialog.py#348: def btnDiffClicked(self):