Enable multi-threaded execution of in MonteCarloSimulation launched from Python

RussTedrake commented 2 years ago

This is a follow-up to the conversation started in this Slack thread: https://drakedevelopers.slack.com/archives/C2CHRT98E/p1653658311307059

The monte-carlo simulation tools have become a useful model for multi-process executions in Drake. We currently disallow multi-threaded execution for MonteCarloSimulation at the binding layer: https://github.com/RobotLocomotion/drake/blob/4cfbb2e03de83351e1614731287d5a6d8fccc668/bindings/pydrake/systems/analysis_py.cc#L280-L282

According to @calderpg-tri ... running the current python tests for monte-carlo with multi-processing enabled will cause the system to hang. My current understanding of the problem is that the c++ code needs deal with the Python GIL. (see, for instance, https://github.com/pybind/pybind11/issues/1087). The big question is whether we can handle this at the level of the bindings, or whether every potential c++ caller must deal with managing the lock.

Enabling this for the monte-carlo tools will pave the way for many other relevant use cases (such as trajectory optimization and DrakeGym) where users would benefit substantially from multi-process executions but also have the ability to pass Python callbacks into the C++ execution.

+@EricCousineau-TRI as pydrake owner.

EricCousineau-TRI commented 2 years ago

I liked the original intent of the thread - to explicitly fail if attempting to mix C++ multithreading with Python.

For concurrent execution in Python involving CPython API extensions, I think multiprocessing is perhaps the simplest / best way forward. Fixing pydrake bindings with GIL is feasible, but would require some design thought, esp. when interacting with other CPython API extensions such as OpenCV.

I will probably be unable to dedicate any real time to this in the near future myself. Am not entirely sure how to delegate.

jwnimmer-tri commented 2 years ago

For concurrent execution in Python involving CPython API extensions, I think multiprocessing is perhaps the simplest / best way forward.

Yes, I tend to agree with this.

Am not entirely sure how to delegate.

For my own part, I don't see any particular urgency here. I believe the resolution would be to create an illustration / example / tutorial showing how to set up concurrent simulations using multiprocessing.

RussTedrake commented 2 years ago

Two thoughts: 1) python multiprocessing normally gets stuck because our Diagram and Context objects aren't pickle-able. In DrakeGym I had to work around that by deferring the construction of the diagrams/contexts to the subprocess (which is not ideal)

2) An example showing multiprocessing would be good, but I'm pretty sure it is not the resolution I seek. I want to run concurrent c++ code (like montecarlo simulations), but in workflows that can be invoked from python. A better resolution from my perspective would be to enable the MonteCarloSimulation, but have it throw if it encountered a python callback in the parallel execution. This is where my questions started on the slack thread linked above.

jwnimmer-tri commented 2 years ago

(2) Sure. If we add a new function bool System::IsThreadSafe() const then we can have Monte Carlo fail-fast in case there is a PyLeafSystem or PyVectorSystem or similar in play when num_threads > 1.

This should catch >90% of the user errors. (I'm not confident that we can plug all of the callback holes. The Monte Carlo documentation should directly explain to Python users that they cannot have any Python code in their Diagram in any way.)

The main challenge will be with the ScalarSystemFunction output callback. It is called on multiple threads, but is a Python functor. We'll could add a lock there in the pydrake bindings so that output calculations happen one-at-a-time, blocking the worker threads in the meantime; the same lock will also need to interpose itself on the simulator factory callback -- the Python code can either be making a new simulator, or summarizing the output, never both concurrently.

Another approach would be to add a "bool use_serial_callbacks" flag to Monte Carlo itself, so that it internally governs the make vs output callbacks -- it might be able to keep the thread schedule more full if we give it direct control. It might also end up being useful for C++ code that can't easily make a thread safe output calculator.

(1) I think pickling the functor that creates the diagram, instead of the diagram itself, should be a fairly easy and plausible work-around for many cases. It also makes it possible to scale the concurrency beyond a single machine. The functor can have bound arguments (functools.partial) for any extra configuration data; it doesn't always need to be a zero-argument function. It will scale in case the user needs to add a Python System (e.g., looping in their controller). It is a minor hurdle to carve out the factory functor to pickle it, but it has the upside that you don't hit a brick wall anytime you need to go beyond pydrake's C++-bound code.

calderpg-tri commented 2 years ago

(2) For the case of ScalarSystemFunction output another approach would be to restructure the dispatch loop so that output is only called in serial on completed simulations, no locks necessary. Doesn't solve potential callbacks in the system, but does ensure that the user-provided methods are never called in parallel.

jwnimmer-tri commented 5 months ago

FYI we have this comment now:

https://github.com/RobotLocomotion/drake/blob/fda9e48b6a0577c862cd07d034d2f256413d7bb6/bindings/pydrake/systems/analysis_py.cc#L419-L422

Adding the parallelism=... arg to the bindings is probably easy.

RobotLocomotion / drake

Enable multi-threaded execution of in MonteCarloSimulation launched from Python #17363