Feature Request: Supervisor method to get image object of Viewpoint rendering

cyberbotics / webots

Webots Robot Simulator

https://cyberbotics.com

Apache License 2.0

3.2k stars 1.68k forks source link

Feature Request: Supervisor method to get image object of Viewpoint rendering #4520

Closed jgueldenstein closed 2 years ago

jgueldenstein commented 2 years ago

When using webots for reinforcement learning using the OpenAI gym environment, the render() method needs to be implemented quite hacky. In reinforcement learning it is often required to have qualitative results (video renderings) to get insight on the learning process.

One option is to attach a camera node to the robot and move it to the Viewpoint, then use the getImage() method of the Camera node.

Another option is to use the exportImage() method of the Supervisor. This saves the image to the filesystem though, and it has to be loaded again. This is obviously slow.

A better option would be to have a method of the Supervisor such as getViewpointImage() which would return the image similar to the getImage() method of the Camera node

ad-daniel commented 2 years ago

Thank you for the suggestion, it is unclear to us what specifically is problematic on your end about the current implementation. Is it the performance or the hassle of having to add the camera node?

Additionally:

exportImage: loading from file is slow yes, but not significantly slower than calling getImage with Python, since calling getImage (contrary to the C counterpart) requires copying the image from GPU to CPU under the hood rather than accessing it directly, which is fairly slow as well. What might add additional overhead is the compression to jpeg when it is exported, in which case adding the possibility of exporting a raw image might be sufficient to achieve comparable performance to a getImage call. What indeed might be an issue with this method is the fact that in the presence of camera overlays, these would be visible as well in the acquisition, which might not be what you want
even with a getViewpointImage function, you'd still need to manually move the viewpoint prior to any acquisition, which doesn't appear significantly different from the current approach that involves having to move the supervisor that contains the camera node. So in the end, it seems the only difference would be the precisely the fact that you currently need this intermediary step of adding a camera node, or am I missing something?

SammyRamone commented 2 years ago

Adding the camera node is possible to do, but requires some additional work. If one uses multiple robots it becomes a lot of work. Also it does not allow to directly use integrated robot protos (e.g. the Darwin-OP) as you need to change it. As an example: I have a setup of a gym environment using 9 different humanoid robots. Therefore the camera based solution would require changes to 9 protos, while having an direct function would not require any changes.

Furthermore, the camera would move with the robot. Of course it is possible to compute and set the transform accordingly, but this is less straight forward than just setting the viewpoint in the global frame. If the camera should just stay on one position (a common use case) it would be especially complicating things in relation to just providing one pose in the world frame and keeping it there.

I did not know, that the difference to using a file is not that significant. Then it would be a solution to just use this approach. Still, I think that a users might expect it to be very slow and start implementing the camera based approach unnecessarily. Maybe adding a bit of additional documentation in the function would then be enough.

SammyRamone commented 2 years ago

I tried using the exportImage() method to get images from the simulation and encountered two issues:

If the rendering is deactivated in the simulation (which it typically is in a RL training process), the returned image is only a black window with "No Rendering" (the same that is shown in the GUI if rendering is deactivated). I did not find any method to activate the rendering. The supervisor.simulationSetMode(mode) can be used to set to SIMULATION_MODE_REAL_TIME speed but this does not enable rendering if it was previously disabled. Therefore one would need to constantly render the simulation to get some images at some point.
Position of the rendered images can be influenced by modifying the viewpoint, but the resolution depends on the size of the Webots window. This is also very unpractical as it is not really possible to chose the resolution.

Therefore I don't think, that the exportImage() function is usable in this case. This only leaves the option of adding an extra camera to the robot. This is, in my opinion, not the best approach as each robot model needs to be modified.

I also searched for other code bases that use Webots as a basis for OpenAI Gym environments. All of them had no render option, which is probably due the missing option of getting the image from the code. I think this is unfortunate, as this is a standard use case in RL.

omichel commented 2 years ago

I believe the best option is to have a supervisor robot equipped with a camera that will take pictures of the scene (you can choose the camera position, orientation, resolution, etc.) and save them as a raw RGB file which the robot controller programs could read without too much overhead (no compression / decompression). If you need extra performance, you can save the image in a memory mapped file.

jgueldenstein commented 2 years ago

The problem with this solution is, that only one robot controller can be executed per process. This would mean that inter-process communication is required to do this solution.

Since it is a quite common requirement in RL, I believe a simple solution to enable the rendering and get an image from the viewpoint is quite beneficial to many.

omichel commented 2 years ago

The problem with this solution is, that only one robot controller can be executed per process. This would mean that inter-process communication is required to do this solution.

Yes, but if the supervisor controller shares images in memory mapped files, it is super easy for robot controllers to read these images. I recently developed a simple example of memory mapped files which you can re-used for this purpose.

Note: we are currently working on a new system allowing to isolate Webots and robot controllers in different docker containers. The new system required to use memory mapped files for images and once complete, it would be super-easy for any robot controller to access the camera image of an other robot controller, thus providing a zero-copy performance to sharing camera images across Webots controllers.

SammyRamone commented 2 years ago

Since implementing a custom IPC solution seemed to be more work, I used the earlier mentioned approach of adding an additional camera to the robot, which was successful. In the following I will leave a few pointers on how to implement this if somebody else stumbles on this issue in the future:

Add a field to your robot model. See here and here
Fill the field in the world file. See here
Get the camera image and convert it to the used RGB array format of OpenAI Gym. See here

In my solution, the camera just moves with the robots base. This could be improved by adding some transform that is adapted at each step, but I did not spend time implementing it since I only use this for debug purposes where it does not matter.