Closed mntan3 closed 3 years ago
Assigning to @mntan3 for further investigation.
@manuelli said he found that the numpy -> msgpack conversion is extremely slow out of the box (in both directions), but that this package implements a much faster alternative. https://pypi.org/project/msgpack-numpy/
fwiw, I think the right solution is probably to write our mechcat visualizer in c++.
cc @xuchenhan-tri
FTR, Erwin has a thin c++ meshcat zmq code here: https://github.com/google-research/tiny-differentiable-simulator/blob/297dc2232780396c7f63cb8be97a5c8490ebc653/examples/meshcat_zmq.h
Planned resolution is to move to c++.
fwiw -- i'm planning to spike-test a c++ implementation over the next few days.
I'll leave some notes here to document some of the relevant decisions.
Websockets not ZMQ. meshcat-python uses a separate ZMQ server to relay between python and the browser:
python Visualizer <=zmq=> zmqserver <=websockets=> browser
I intend to go directly from c++ meshcat to the browser:
c++ Visualizer <= websockets => browser
Having discussed with @rdeits, the zmq server design was put in place partially to support multiple geometry suppliers (the visualizers) and consumers (the browsers), but also just to parcel out the asyncio complexities away from the supplier in python. His Julia meshcat Visualizer just goes straight to the browser via websockets, and he's been recommending that to me when I upgrade. This is especially relevant because I am trying to add new support for gui elements in the meshcat browser sending information back to c++, and the zmqserver in the middle complicates that workflow significantly.
C++ websocket libraries. I've now explored a handful of websocket libraries that we could potentially use in drake c++. This list was helpful. Taking a number of factors, such as licensing and light dependencies, I ended up looking most closely at:
Basic C++ design is currently:
My current PR strategy is: 1) Meshcat proof of life. Starts the server, demonstrates that clients can connect, and just sends one type of message to show that data can flow. Reviewers can focus on the build system and websocket server details. 2) Bring in testing framework. Requires new build dependencies, which I want to separate from the original PR. 3) Meshcat full api (set_transform, delete, etc). Still with only a modicum of geometry supported. 4) basic MeshcatVisualizer (c++ only, no bindings) implementation. Importantly, I this version will have optional output ports for ui feedback. 5) Add python bindings
Then we can add more geometry / bells and whistles incrementally.
For a more mature testing strategy, I'm currently trying:
meshcat.Viewer()
directly in node.js
, so that I can provide a test utility that connects to my C++ websocket server and checks for certain conditions to be met (e.g. that set_property('/Background', 'visible', false)
has the desired result. This seems the ideal in terms of verifying correctness. It has some immediate challenges in terms of getting meshcat
to run headless, which I'm slowing bashing through, and then adding nodejs + npm libraries into the drake test installation framework.Alternatives could include:
Test strategy update: I've got the following working well with a minimal nodejs setup:
// Test utility for Meshcat that
// 1) Connects a (headless) meshcat Viewer object via websockets to `ws_url`,
// 2) Waits until the Viewer receives `num_messages_to_wait_for` messages
// (default: 0),
// 3) Evaluates the string `eval_string`.
// 4) Exits with return code 0 if the `eval_string` evaluates to `true`,
// otherwise with return code 1.
//
// Run with `node meshcat_test.js ws_url eval_string [num_messages_to_wait_for]`
// e.g.
// node meshcat_test.js 'ws://localhost:7001' \
// "viewer.scene_tree.find(['Background']).object.visible == true" 3
//
// Requires meshcat, and `npm install jsdom webgl-mock-threejs canvas`.
The full script is here: meshcat_test.js
... adding nodejs + npm libraries into the drake test installation framework.
FYI My rough impression from very quick glances in the past was that node and npm were extremely difficult to make sufficiently hermetic for use Drake. You might want to de-risk that before walking too far down this path. Maybe https://github.com/bazelbuild/rules_nodejs has already resolved this by now, but I don't think we know for sure yet.
connecting to the websocket (probably from python) and simply verifying that the message is getting through as expected.
Why is this option not the best answer? We don't acceptance test drake-visualizer
round trip, we assume that it has its own testing in place, and so within Drake we just check that the messages we are sending it are as desired. That same story seems like it should be plenty sufficient for meshcat as well? If we find that too many bugs are slipping through, we can always upgrade to a headless regression test in the future.
Tentative plan for transitioning from python to c++:
drake::geometry::Meshcat
to pydrake.geometry.Meshcat
. There is no conflict here.drake::geometry::MeshcatVisualizer
to pydrake.geometry.MeshcatVisualizerCpp
to avoid any naming conflict, even with the different package paths; it also forces the user to realize that they are getting a different object, which has a slightly different workflow (no zmqserver, etc).pydrake.systems.meshcat_visualizer.MeshcatVisualizer
mostly alone for now, until the cpp version achieves feature parity. Probably after I get through porting my notes in this fall semester would be a natural time for the next phase.Once we feel that feature parity has been reached, we can deprecate the python MeshcatVisualizer
; there should be a reasonable way to do this since the constructors will take different arguments: the c++ version will want a Meshcat
object passed in, the python version wants a zmq_url
, etc. Also, we have the fact that the c++ version will offer AddToBuilder
, which DrakeVisualizer
switched to, and the python version still uses the original Connect*Visualizer
spelling.
Issue Description
When visualizing a model with many links, I was seeing that meshcat was slowing down the simulator significantly and the drake visualizer doesn't. It would be nice if this were at least documented if in fact meshcat is just slower than Drake Visualizer.
Example test script and model here to replicate:
https://gist.github.com/mntan3/be02a1c410a0830f2ddb656aaf6403e2 After running the script for 10 seconds, it should print out the simulator rate. I was observing something like a rate of 0.8 for drake visualizer and 0.6 for meshcat on my computer
Initial discussions from slack:
https://drakedevelopers.slack.com/archives/C43KX47A9/p1586538504017800
sean.curtis 2 hours ago
I think you may just be victim of python loops vs C++ loops.mntan 2 hours ago
Just to clarify, so you're saying that meshcat is written in python and drake_vis is written in c++, so that's why meshcat is going to be slower?sean.curtis 2 hours ago
Essentially -- there may be other reasons, but that will be one wall you won't be able to get around. And the key, particularly, is the work that has to be done in a Drake System to translate Drake state to be consumed by the visualizer. You might try collecting timing on the publish method of the mesh cat visualizer -- easy enough to do in python. I bet most of the time lost is spent right there.eric.cousineau 1 hour ago
I think this may be good to track in an issue. @mntan Do you feel comfortable porting this to a Drake issue?eric.cousineau 1 hour ago
My guess is it might be slow due to mesh conversion for sending it over the wire?