Open huweiATgithub opened 1 year ago
I'm not sure where these should go -- assigning to @SeanCurtis-TRI for disposition.
Hello there, could you provide an update on the current status of this issue? Additionally, I am interested in contributing to this project and would appreciate it if someone could share the design of Drake's implementation of Meshcat. Thank you!
I took a quick look and now have an almost working draft. I will post that draft soon.
I have an initial draft started here: https://github.com/RobotLocomotion/drake/compare/master...RussTedrake:drake:meshcat_capture_image?expand=1
There were a few hang-ups/questions... starting from the most blocking: 1) meshcat currently drops the connection if you request images over e.g. 100x100. (i suspect that uwebsockets is dropping the connection) i can run
viewer.handle_command({type:"capture_image", xres:100, yres:100})
in the browser, and watch my receiving thread get the image.... but if I increase that to say 320x240, then the viewer drops the connection and reconnects. @huweiATgithub -- i assume this is working for you in meshcat-python?
2) my first pass at the API was ImageRgba8U CaptureImage(int x_resolution, int y_resolution, double timeout)
. But it turns out that the returned image is in png format. So I'd need to bring in a png reader (the vtk png reader from a memory buffer looked like it would require ~30 lines!). This also leads to issues about having multiple clients all respond to the capture command by sending you images. What should Meshcat do then? As a passable work-around, I now instead implemented void CaptureImage(std::filesystem::path filename, int x_resolution, int y_resolution)
. Every message that gets received simply gets written to that file on disk. If multiple responses come it, the last writer wins. Not ideal, but at least it's unambiguous.
@joemasterjohn -- perhaps you can take a look and help me decide which path to march down?
Thanks! I will take a try.
An alternative approach that can return raw image in msgpack :
viewer.capture_image()
drake::Meshcat
https://stackoverflow.com/a/65019143 edit: I see now this is still png encoded array :frowning_face: Sorry this isn't a complete solutionThanks @pathammer -- we're doing almost exactly that now (except not the msgpack). We just need a c++ version of the get_pixels
to (I believe) just strip the header off the png. I doesn't seem like it should be a very heavy lift (e.g. https://stackoverflow.com/questions/31079947/how-can-i-manually-read-png-files-in-c). We already have vtk if we must use it... but I just want to make sure we like the API before I proceed.
I'd need to bring in a png reader (the vtk png reader from a memory buffer looked like it would require ~30 lines!).
I'd say 30 lines of code is not very scary, anyway. (Alternatively, we could also refactor lcm_image_array_to_images.h
to extract the png decoder as a callable helper function.)
This also leads to issues about having multiple clients all respond to the capture command by sending you images. What should Meshcat do then? As a passable work-around, I now instead implemented void CaptureImage(std::filesystem::path filename, int x_resolution, int y_resolution). Every message that gets received simply gets written to that file on disk. If multiple responses come it, the last writer wins. Not ideal, but at least it's unambiguous.
Unless done carefully, this would lead to race conditions, where the file is overwritten while someone else is reading it. We'd need to be careful to write the image to a tempfile in the same output directory, and then after it's done writing and closed, rename it to the final filename. (File renames within the same directory are atomic.)
In any case, I don't love leaving it up to chance. If there are multiple connections, I think we either need to have GetImage
throw an exception immediately, or else we designate either "first connection wins" or "last connection wins" and only ask for one image and only return that one.
In the extreme, I suppose we could return a vector of images (one per connection), but I'm not sure anyone would want that?
Ok. I've taken one more pass based on @jwnimmer-tri's feedback.
My branch now does everything we like except it still fails for reasonably sized images. I haven't looked into that yet. https://github.com/RobotLocomotion/drake/compare/master...RussTedrake:drake:meshcat_capture_image?expand=1
Specifically in a notebook if you run a first cell
from pydrake.all import StartMeshcat
meshcat = StartMeshcat()
then connect to meshcat in your browser, you can then run a second cell
import matplotlib.pyplot as plt
capture = meshcat.CaptureImage(100,100,timeout=1)
if capture.size() > 0:
fig, ax = plt.subplots()
ax.imshow(capture.data)
to see a 100x100 image like this:
I think we just need to look into the message size issue, and then clean up the still messy branch.
... I'd need to bring in a png reader ...
We have a from-memory PNG reader more easily available now. See drake/systems/sensors/vtk_image_reader_writer.h
for the API and drake/systems/sensors/lcm_image_array_to_images.cc
for an example of using it.
FYI I'm going to see if I can find any fix for the larger images. I want to use this feature to generate new images #18913 semi-automatically.
Edit:
Here's my branch. I got far enough to rebase and port to the VTK PNG reader, and confirmed that I also see the crash. I ran out of time to find a fix. (I only ended up needing 1 screenshot, so I just did it by hand.)
FYI The PNG reading part is easy now. We have a class to load images in PNG format (and more) into Drake's Image class: https://drake.mit.edu/doxygen_cxx/classdrake_1_1systems_1_1sensors_1_1_image_io.html
Since this issue hasn't been resolved yet, for anyone who want to capture images from MeshCat, you can try this workaround:
from selenium import webdriver
import base64
# Set up Chrome options to run in headless mode
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless") # Run Chrome in headless mode
# Initialize the Chrome WebDriver
driver = webdriver.Chrome(chrome_options)
# Open the Meshcat viewer's URL
driver.get(meshcat.web_url())
image_data_url = driver.execute_script("""
return viewer.capture_image(1920, 1080); // Capture the image with width and height
""")
# Extract the base64 part of the data URL (after "data:image/png;base64,")
image_base64 = image_data_url.split(",")[1]
# Decode the base64 string and save it as an image file
image_data = base64.b64decode(image_base64)
# Write the image to a file
with open("./test_data/meshcat_image.png", "wb") as f:
f.write(image_data)
print("Image saved as 'meshcat_image.png'")
# Close the browser session
driver.quit()
The meshcat-python has a function that is quite useful in visualization. It can directly capture the images in the visualizer. See:
https://github.com/rdeits/meshcat-python/blob/cd04af433f2196af2c1fa8d52457298938b9a838/src/meshcat/visualizer.py#L71-L80
It seems that Drake's C++ implementation of meshcat does not provide such a utility. https://drake.mit.edu/doxygen_cxx/classdrake_1_1geometry_1_1_meshcat_animation.html#a58a8ccf1df1c55c0f9ae6b3644307ad3
I think capturing images from the visualizer is not the same as rendering images in the environment. It would be
Describe the solution you'd like Add similar function to MeshCat to capture image.
Describe alternatives you've considered I think capturing images from the visualizer is not the same as rendering images in the environment. RgbSensor is a much more serious approach.