RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.27k stars 1.26k forks source link

pydrake: `matplotlib.animation` import causes freezing? #14075

Closed EricCousineau-TRI closed 1 year ago

EricCousineau-TRI commented 4 years ago

On both my laptop and my desktop, drake @ c16eeeaea8 with the following test seems to freeze (at the last trace statement):

$ bazel run //bindings/pydrake:py/all_test -- --trace=user
...
 --- modulename: pyplot_visualizer, funcname: <module>                                                 
pyplot_visualizer.py(1): import matplotlib                                                             
pyplot_visualizer.py(2): import matplotlib.animation as animation 
...
 --- modulename: animation, funcname: _handle_subprocess
animation.py(625):         _, err = process.communicate()

No idea what changed to make this happen, and not sure if this on my machine or other's too. Ran into this while working on #14072

I worked around it (for the sake of doing the PR) by just commenting out the import of matplotlib.animation.

If I import it in isolation, it's totally fine:

python3 -c 'import matplotlib.animation`

FYI @RussTedrake

EricCousineau-TRI commented 4 years ago

I'm wondering if it has something to do with Jupyter stuff being imported too...

jwnimmer-tri commented 4 years ago

It passes under bazel test. It's only bazel run that fails. Here's some diffs from the trace of where those two executions start to differ:

diff -u bazel-test.log bazel-run.log

@@ -154382,10 +154400,7 @@
 __init__.py(569):             if _is_writable_dir(p):
  --- modulename: __init__, funcname: _is_writable_dir
 __init__.py(215):     return os.access(p, os.W_OK) and os.path.isdir(p)
-__init__.py(579):     return _create_tmp_config_dir()
- --- modulename: __init__, funcname: _create_tmp_config_dir
-__init__.py(510):         tempfile.mkdtemp(prefix='matplotlib-'))
-__init__.py(511):     return configdir
+__init__.py(570):                 return p
 __init__.py(738):         if os.path.isfile(fname):
 __init__.py(737):     for fname in gen_candidates():
  --- modulename: __init__, funcname: gen_candidates
@@ -167327,23 +167342,8 @@
 __init__.py(1249): def use(arg, warn=True, force=False):
 __init__.py(1309): try:
 __init__.py(1310):     use(os.environ['MPLBACKEND'])
- --- modulename: __init__, funcname: use
-__init__.py(1275):     if arg.startswith('module://'):
-__init__.py(1279):         arg = arg.lower()
-__init__.py(1280):         name = validate_backend(arg)
-__init__.py(1283):     if 'matplotlib.backends' in sys.modules:
-__init__.py(1298):         need_reload = False
-__init__.py(1301):     rcParams['backend'] = name
- --- modulename: __init__, funcname: __setitem__
-__init__.py(797):         try:
-__init__.py(798):             if key in _deprecated_map:
-__init__.py(804):             elif key in _deprecated_set and val is not None:
-__init__.py(807):             elif key in _deprecated_ignore_map:
-__init__.py(812):             elif key in _obsolete_set:
-__init__.py(816):             try:
-__init__.py(817):                 cval = self.validate[key](val)
-__init__.py(820):             dict.__setitem__(self, key, cval)
-__init__.py(1305):     if need_reload:
+__init__.py(1311): except KeyError:
+__init__.py(1312):     pass
 __init__.py(1315): def get_backend():
 __init__.py(1320): def interactive(b):
 __init__.py(1329): def is_interactive():

Current guess: matplotlib.get_backend() returns ~extra~ different stuff when sandboxing is disabled. The /etc/matplotlibrc file says backend : TkAgg by default on Ubuntu, and I assume in the test sandbox the file is inaccessible to we start loading a bunch of other stuff and go off the deep end waiting for some ffpmpeg command line that never returns.

jwnimmer-tri commented 4 years ago

Oh, and bazel-bin/bindings/pydrake/py/all_test --trace=user also passes. Once again proving that bazel run is the devil and why are you trying to use it?

jamiesnape commented 4 years ago

Since python3 -c 'import matplotlib.animation works and bazel test works, not sure this is medium priority. Do we (or Bazel) really support bazel run <test> and/or expect the behavior to be the same as bazel test <test>?

jwnimmer-tri commented 4 years ago

Oh, I hadn't even looked at the issue assignee! I assumed that it was another self-assigned ticket for brain dumping.

There might be something useful to learn here about non-hermetic matplotlib quicksand for other purposes, but I'm not sure this repro is the best way to explore that, and I definitely agree that bazel run <mytest> problems are "priority: backlog" at best.

jamiesnape commented 4 years ago

There is certainly a lot of quicksand once you enumerate the configuration files of the dependency chain.

EricCousineau-TRI commented 4 years ago

I guess I use bazel run because we have examples that state it, and we don't explicitly have any directives not to use it aside from things like this.

If it's really a policy we should encode, should we write docs and purge it's existence? Perhaps even actively disable it? (and if just for unittests, at least the same?)

It'd be super easy with a wrapper Bazel script.

EricCousineau-TRI commented 4 years ago
$ git log -n 1 --oneline --no-decorate
c16eeeaea8 lcm: Add publish_period option to ConnectLcmScope (#14062)
$ grep -rnI 'bazel run' | wc -l
59
jwnimmer-tri commented 4 years ago

Do we have examples that instruct anyone to execute tests using bazel run? At minimum, we should change those instructions. Executing non-test binaries using bazel run is much less suspicious.

EricCousineau-TRI commented 4 years ago

Yes (but fewer):

$ grep -rnI 'bazel run .*_test'
doc/bazel.rst:118:  bazel run multibody/plant:multibody_plant_test -- --gtest_filter='*SimpleModelCreation*'
doc/bazel.rst:124:  bazel run bindings/pydrake:py/symbolic_test -- --trace=user --deprecation_action=error
manipulation/models/ycb/test/parse_test.cc:7:    bazel run //manipulation/models/ycb:parse_test -- --visualize_sec=5
manipulation/models/ycb/test/parse_test.cc:11:    bazel run //manipulation/models/ycb:parse_test -- \
bindings/pydrake/pydrake_doxygen.h:499:    bazel run //bindings/pydrake/systems:py/lifetime_test -- --trace=user
jwnimmer-tri commented 3 years ago

Kicking this back to @EricCousineau-TRI as the only interested party. IIRC when I looked last month, all of the examples in the prior post were authored by Eric anyway. So really, the only action here is to rewrite the test docs to always use bazel test for running tests.

EricCousineau-TRI commented 2 years ago

Encountered again in #17129. Additional action is "use bazel test, and/or ensure tests use Rlocation"

EricCousineau-TRI commented 1 year ago

Running again on Ubuntu Jammy (22.04) on 11bc25640f with minor change:

$ git diff
diff --git a/bindings/pydrake/test/all_test.py b/bindings/pydrake/test/all_test.py
index ee9b4cfb93..d30f6efd3a 100644
--- a/bindings/pydrake/test/all_test.py
+++ b/bindings/pydrake/test/all_test.py
@@ -6,7 +6,7 @@ import warnings
 # When importing pydrake, confirm that it works without any matplotlib
 # customization. We don't need the MPLBACKEND=Template override (from
 # our tools/bazel.rc file) since importing doesn't open any new windows.
-del os.environ['MPLBACKEND']  # noqa
+# del os.environ['MPLBACKEND']  # noqa

 from pydrake.common.test_utilities.deprecation import catch_drake_warnings

$ bazel run //bindings/pydrake:py/all_test
# Runs A-OK. No freezing.

FYI @trowell-tri re: #18323

jwnimmer-tri commented 1 year ago

The #18544 adds a regression test that matplotlib.animation is never imported by pydrake.all. Is that sufficient to close this issue?

EricCousineau-TRI commented 1 year ago

Works for me, yup!