Closed jgueldenstein closed 2 years ago
Thank you for the suggestion, it is unclear to us what specifically is problematic on your end about the current implementation. Is it the performance or the hassle of having to add the camera node?
Additionally:
getImage
(contrary to the C counterpart) requires copying the image from GPU to CPU under the hood rather than accessing it directly, which is fairly slow as well. What might add additional overhead is the compression to jpeg when it is exported, in which case adding the possibility of exporting a raw image might be sufficient to achieve comparable performance to a getImage call. What indeed might be an issue with this method is the fact that in the presence of camera overlays, these would be visible as well in the acquisition, which might not be what you wantgetViewpointImage
function, you'd still need to manually move the viewpoint prior to any acquisition, which doesn't appear significantly different from the current approach that involves having to move the supervisor that contains the camera node. So in the end, it seems the only difference would be the precisely the fact that you currently need this intermediary step of adding a camera node, or am I missing something?Adding the camera node is possible to do, but requires some additional work. If one uses multiple robots it becomes a lot of work. Also it does not allow to directly use integrated robot protos (e.g. the Darwin-OP) as you need to change it. As an example: I have a setup of a gym environment using 9 different humanoid robots. Therefore the camera based solution would require changes to 9 protos, while having an direct function would not require any changes.
Furthermore, the camera would move with the robot. Of course it is possible to compute and set the transform accordingly, but this is less straight forward than just setting the viewpoint in the global frame. If the camera should just stay on one position (a common use case) it would be especially complicating things in relation to just providing one pose in the world frame and keeping it there.
I did not know, that the difference to using a file is not that significant. Then it would be a solution to just use this approach. Still, I think that a users might expect it to be very slow and start implementing the camera based approach unnecessarily. Maybe adding a bit of additional documentation in the function would then be enough.
I tried using the exportImage()
method to get images from the simulation and encountered two issues:
supervisor.simulationSetMode(mode)
can be used to set to SIMULATION_MODE_REAL_TIME
speed but this does not enable rendering if it was previously disabled. Therefore one would need to constantly render the simulation to get some images at some point.Therefore I don't think, that the exportImage()
function is usable in this case.
This only leaves the option of adding an extra camera to the robot. This is, in my opinion, not the best approach as each robot model needs to be modified.
I also searched for other code bases that use Webots as a basis for OpenAI Gym environments. All of them had no render option, which is probably due the missing option of getting the image from the code. I think this is unfortunate, as this is a standard use case in RL.
I believe the best option is to have a supervisor robot equipped with a camera that will take pictures of the scene (you can choose the camera position, orientation, resolution, etc.) and save them as a raw RGB file which the robot controller programs could read without too much overhead (no compression / decompression). If you need extra performance, you can save the image in a memory mapped file.
The problem with this solution is, that only one robot controller can be executed per process. This would mean that inter-process communication is required to do this solution.
Since it is a quite common requirement in RL, I believe a simple solution to enable the rendering and get an image from the viewpoint is quite beneficial to many.
The problem with this solution is, that only one robot controller can be executed per process. This would mean that inter-process communication is required to do this solution.
Yes, but if the supervisor controller shares images in memory mapped files, it is super easy for robot controllers to read these images. I recently developed a simple example of memory mapped files which you can re-used for this purpose.
Note: we are currently working on a new system allowing to isolate Webots and robot controllers in different docker containers. The new system required to use memory mapped files for images and once complete, it would be super-easy for any robot controller to access the camera image of an other robot controller, thus providing a zero-copy performance to sharing camera images across Webots controllers.
Since implementing a custom IPC solution seemed to be more work, I used the earlier mentioned approach of adding an additional camera to the robot, which was successful. In the following I will leave a few pointers on how to implement this if somebody else stumbles on this issue in the future:
In my solution, the camera just moves with the robots base. This could be improved by adding some transform that is adapted at each step, but I did not spend time implementing it since I only use this for debug purposes where it does not matter.
When using webots for reinforcement learning using the OpenAI gym environment, the
render()
method needs to be implemented quite hacky. In reinforcement learning it is often required to have qualitative results (video renderings) to get insight on the learning process.One option is to attach a camera node to the robot and move it to the Viewpoint, then use the
getImage()
method of the Camera node.Another option is to use the
exportImage()
method of the Supervisor. This saves the image to the filesystem though, and it has to be loaded again. This is obviously slow.A better option would be to have a method of the Supervisor such as
getViewpointImage()
which would return the image similar to thegetImage()
method of the Camera node