RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.36k stars 1.27k forks source link

Make Meshcat thread-safe #20889

Open jwnimmer-tri opened 9 months ago

jwnimmer-tri commented 9 months ago

Is your feature request related to a problem? Please describe.

Drake's Meshcat class overview currently says:

@warning In the current implementation, Meshcat methods must be called from the
same thread where the class instance was constructed.  For example, running
multiple simulations in parallel using the same Meshcat instance is not yet
supported. We may generalize this in the future.

As more people use Drake in multi-threaded ways, this is increasingly becoming a pain point.

In many cases, users want to have a single browser window open with the simulated scene, visual debugging hints (trajectories), EE pose targets, etc. Sometimes those values end up being calculated on different threads, but the user still wants to see them in a central location.

In the past when our visualization happened by sending messages, they could just send different messages on different threads and everything would be OK, but now that we have an in-process singleton server that we speak to via API calls, that use case becomes more difficult.

We'd like to provide a way to have multiple threads in the same process all contribute illustration information to a single, shared display.

Describe the solution you'd like

The easiest way to accomplish this is probably to adjust the class Meshcat implementation so that we can remove the warning. I think this would basically involve launching a second internal thread, and every public function would post a future into that service thread and await the promised result. Then the worker thread could just pop the deferred calls off the work queue and service them, so we don't need any locks beyond just the work queue. Conceivably the worker thread could actually be the websocket thread we already have, but there's often a non-trivial amount of prep work we need to do (e.g., loading a mesh from disk) before we're ready to modify the scene tree, so my initial feeling is that a second thread would be meaningfully more performant.

Describe alternatives you've considered

Another option (which would have other benefits) would be to allow running class Meshcat against a long-lived, external server process. Then the work queue is all just awaiting tx/rx of the messages to the remote server. This would allow running a multi-process sim all into the same display head. Probably that's a different issue than this one, however.

Additional context

In the code itself, the goal is to be able to remove the "is main thread" checks like this one:

https://github.com/RobotLocomotion/drake/blob/e3c608505bbe3c7fe010c7d137505a05ccde9449/geometry/meshcat.cc#L897-L901

bhogan-bdai commented 7 months ago

What is the status of this?

I have a use case to display 4 meshcat scenes (visual not physical simulation) in a dashboard. The data that is populating the visual and animating came from a prior physically simulated run using Drake.

The limitation of the Meshcat object being used on the same thread that created it is resulted in a quite a few changes to an implementation that is agnostic to all this.

The limitation of the Meshcat object being used on the main thread, prevents a threaded Condition + Queue solution when 4 scenes are desired.