RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.26k stars 1.26k forks source link

Get Images from Meshcat #18912

Open huweiATgithub opened 1 year ago

huweiATgithub commented 1 year ago

The meshcat-python has a function that is quite useful in visualization. It can directly capture the images in the visualizer. See:

https://github.com/rdeits/meshcat-python/blob/cd04af433f2196af2c1fa8d52457298938b9a838/src/meshcat/visualizer.py#L71-L80

It seems that Drake's C++ implementation of meshcat does not provide such a utility. https://drake.mit.edu/doxygen_cxx/classdrake_1_1geometry_1_1_meshcat_animation.html#a58a8ccf1df1c55c0f9ae6b3644307ad3

I think capturing images from the visualizer is not the same as rendering images in the environment. It would be

Describe the solution you'd like Add similar function to MeshCat to capture image.

Describe alternatives you've considered I think capturing images from the visualizer is not the same as rendering images in the environment. RgbSensor is a much more serious approach.

sherm1 commented 1 year ago

I'm not sure where these should go -- assigning to @SeanCurtis-TRI for disposition.

huweiATgithub commented 1 year ago

Hello there, could you provide an update on the current status of this issue? Additionally, I am interested in contributing to this project and would appreciate it if someone could share the design of Drake's implementation of Meshcat. Thank you!

RussTedrake commented 1 year ago

I took a quick look and now have an almost working draft. I will post that draft soon.

RussTedrake commented 1 year ago

I have an initial draft started here: https://github.com/RobotLocomotion/drake/compare/master...RussTedrake:drake:meshcat_capture_image?expand=1

There were a few hang-ups/questions... starting from the most blocking: 1) meshcat currently drops the connection if you request images over e.g. 100x100. (i suspect that uwebsockets is dropping the connection) i can run

viewer.handle_command({type:"capture_image", xres:100, yres:100})

in the browser, and watch my receiving thread get the image.... but if I increase that to say 320x240, then the viewer drops the connection and reconnects. @huweiATgithub -- i assume this is working for you in meshcat-python?

2) my first pass at the API was ImageRgba8U CaptureImage(int x_resolution, int y_resolution, double timeout). But it turns out that the returned image is in png format. So I'd need to bring in a png reader (the vtk png reader from a memory buffer looked like it would require ~30 lines!). This also leads to issues about having multiple clients all respond to the capture command by sending you images. What should Meshcat do then? As a passable work-around, I now instead implemented void CaptureImage(std::filesystem::path filename, int x_resolution, int y_resolution). Every message that gets received simply gets written to that file on disk. If multiple responses come it, the last writer wins. Not ideal, but at least it's unambiguous.

  1. slightly annoying, meshcat current sends the image back in a JSON format instead of using msgpack. That makes the decoding on the Drake side slightly ugly. And perhaps it could explain the crash for large images? Not sure.

@joemasterjohn -- perhaps you can take a look and help me decide which path to march down?

huweiATgithub commented 1 year ago

Thanks! I will take a try.

pathammer commented 1 year ago

An alternative approach that can return raw image in msgpack :

  1. Add a 'capture_image_raw' handler to meshcat.html
  2. save result of viewer.capture_image()
  3. Return a base64 msgpack encoded buffer to drake::Meshcat https://stackoverflow.com/a/65019143 edit: I see now this is still png encoded array :frowning_face: Sorry this isn't a complete solution
RussTedrake commented 1 year ago

Thanks @pathammer -- we're doing almost exactly that now (except not the msgpack). We just need a c++ version of the get_pixels to (I believe) just strip the header off the png. I doesn't seem like it should be a very heavy lift (e.g. https://stackoverflow.com/questions/31079947/how-can-i-manually-read-png-files-in-c). We already have vtk if we must use it... but I just want to make sure we like the API before I proceed.

jwnimmer-tri commented 1 year ago

I'd need to bring in a png reader (the vtk png reader from a memory buffer looked like it would require ~30 lines!).

I'd say 30 lines of code is not very scary, anyway. (Alternatively, we could also refactor lcm_image_array_to_images.h to extract the png decoder as a callable helper function.)

This also leads to issues about having multiple clients all respond to the capture command by sending you images. What should Meshcat do then? As a passable work-around, I now instead implemented void CaptureImage(std::filesystem::path filename, int x_resolution, int y_resolution). Every message that gets received simply gets written to that file on disk. If multiple responses come it, the last writer wins. Not ideal, but at least it's unambiguous.

Unless done carefully, this would lead to race conditions, where the file is overwritten while someone else is reading it. We'd need to be careful to write the image to a tempfile in the same output directory, and then after it's done writing and closed, rename it to the final filename. (File renames within the same directory are atomic.)

In any case, I don't love leaving it up to chance. If there are multiple connections, I think we either need to have GetImage throw an exception immediately, or else we designate either "first connection wins" or "last connection wins" and only ask for one image and only return that one.

In the extreme, I suppose we could return a vector of images (one per connection), but I'm not sure anyone would want that?

RussTedrake commented 1 year ago

Ok. I've taken one more pass based on @jwnimmer-tri's feedback.

My branch now does everything we like except it still fails for reasonably sized images. I haven't looked into that yet. https://github.com/RobotLocomotion/drake/compare/master...RussTedrake:drake:meshcat_capture_image?expand=1

Specifically in a notebook if you run a first cell

from pydrake.all import StartMeshcat
meshcat = StartMeshcat()

then connect to meshcat in your browser, you can then run a second cell

import matplotlib.pyplot as plt

capture = meshcat.CaptureImage(100,100,timeout=1)
if capture.size() > 0:
    fig, ax = plt.subplots()
    ax.imshow(capture.data)

to see a 100x100 image like this: output

I think we just need to look into the message size issue, and then clean up the still messy branch.

jwnimmer-tri commented 1 year ago

... I'd need to bring in a png reader ...

We have a from-memory PNG reader more easily available now. See drake/systems/sensors/vtk_image_reader_writer.h for the API and drake/systems/sensors/lcm_image_array_to_images.cc for an example of using it.

jwnimmer-tri commented 1 year ago

FYI I'm going to see if I can find any fix for the larger images. I want to use this feature to generate new images #18913 semi-automatically.

Edit:

Here's my branch. I got far enough to rebase and port to the VTK PNG reader, and confirmed that I also see the crash. I ran out of time to find a fix. (I only ended up needing 1 screenshot, so I just did it by hand.)

jwnimmer-tri commented 7 months ago

FYI The PNG reading part is easy now. We have a class to load images in PNG format (and more) into Drake's Image class: https://drake.mit.edu/doxygen_cxx/classdrake_1_1systems_1_1sensors_1_1_image_io.html

CWEzio commented 2 days ago

Since this issue hasn't been resolved yet, for anyone who want to capture images from MeshCat, you can try this workaround:

from selenium import webdriver
import base64

# Set up Chrome options to run in headless mode
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")  # Run Chrome in headless mode

# Initialize the Chrome WebDriver 
driver = webdriver.Chrome(chrome_options)

# Open the Meshcat viewer's URL
driver.get(meshcat.web_url())
image_data_url = driver.execute_script("""
    return viewer.capture_image(1920, 1080);  // Capture the image with width and height
""")
# Extract the base64 part of the data URL (after "data:image/png;base64,")
image_base64 = image_data_url.split(",")[1]

# Decode the base64 string and save it as an image file
image_data = base64.b64decode(image_base64)
# Write the image to a file
with open("./test_data/meshcat_image.png", "wb") as f:
    f.write(image_data)

print("Image saved as 'meshcat_image.png'")

# Close the browser session
driver.quit()