NVlabs / handover-sim

A simulation environment and benchmark for human-to-robot object handovers
https://handover-sim.github.io
BSD 3-Clause "New" or "Revised" License
87 stars 14 forks source link

How to introduce RGBD image to the policy network for the handover_sim environment #12

Closed hwpengTristin closed 11 months ago

hwpengTristin commented 1 year ago

I want to obtain the RGBD as state and input to the policy network, but I find the simulator rendering became very slow. I mainly modify three prats of the code.

Specifically, I modify part of the code in the project, including handover_env.py (class HandoverHandCameraPointStateEnv(HandoverEnv) ) and run_benchmark_gaddpg_hold.py (class GADDPGPolicy(SimplePolicy) and class PointListener).

1) I want to introduce the rgb and depth image to the network, so add two more items to the observation, i.e., observation["rgb"] and observation["depth"], as following:

class HandoverHandCameraPointStateEnv(HandoverEnv) in handover_env.py with modification details:

def _get_observation(self):
    observation = {}
    observation["frame"] = self.frame
    observation["panda_link_ind_hand"] = self.panda.LINK_IND_HAND
    observation["panda_body"] = self.panda.body
    observation["callback_get_point_state"] = self._get_point_state
    **observation["rgb"]=self.panda._camera.color[0]
    observation["depth"]=self.panda._camera.depth[0]**
    return observation

2) Use obs["rgb"] and obs["depth"] to the policy network, with corresponding modification details:

class GADDPGPolicy(SimplePolicy) in run_benchmark_gaddpg_hold.py

def plan(self, obs):
    info = {}

    if (obs["frame"] - self._steps_wait) % self._steps_action_repeat == 0:
        point_state, obs_time = self._get_point_state_from_callback(obs)
        info["obs_time"] = obs_time

        if point_state.shape[1] == 0 and self._point_listener.acc_points.shape[1] == 0:
            action = np.array(self._cfg.ENV.PANDA_INITIAL_POSITION)
        else:
            ef_pose = self._get_ef_pose(obs)
            ef_pose_new = self._point_listener.run_network(**(point_state, obs["rgb"], obs["depth"])**, ef_pose)
       ...
       ...

3) rgb and depth concat together and resize to (112, 112), with corresponding modification details:

class PointListener in run_benchmark_gaddpg_hold.py with modification details::

def _point_state_to_state(self, point_state, ef_pose): point_state, rgb, depth = point_state[0],point_state[1],point_state[2] rgb=rgb.numpy()[:,:,:3] depth = depth.numpy() depth=depth[:,:,np.newaxis] rgbd=np.concatenate([rgb,depth],axis=2) rgbd=np.transpose(rgbd, (2,0,1)) rgbd_resized=np.zeros((4,112,112)) for i in range(rgbd.shape[0]): rgbd_resized[i]=cv2.resize(rgbd[i],(112,112))

    point_state = self._process_pointcloud(point_state, ef_pose)
    # image_state = np.array([])
    image_state=rgbd_resized
    obs = (point_state, image_state)
    return obs

Is there something wrong with the modification codes? I would like to know how to introduce RGBD image to the policy network for the handover_sim environment.

ychao-nvidia commented 1 year ago

Yes, you can render the color image from Bullet with color = self._camera.color[0].numpy() (just like depth here).

It becomes slow for you because your code above will render at every simulation step. By default handover-sim runs simulation with a 0.001 second step size, and GA-DDPG only runs the policy every 0.150 second (see here and here). To avoid unnecessary rendering, we only render at the steps that we need to run the policy. This is done by not returning images in obs but rather just returning a callback, and let the control flow runs the callback only when needed (see here, here, here, and here).

So what you should do is to remove the below lines in HandoverHandCameraPointStateEnv._get_observation() here:

    **observation["rgb"]=self.panda._camera.color[0]
    observation["depth"]=self.panda._camera.depth[0]**

and instead add another return variable for color in PandaHandCamera.get_point_state() here.

Another potential venue to speed up Bullet's offscreen rendering if you have GPU is using Bullet's EGL rendering (see EGL rendering here). This can be turned on with this flag: SIM.BULLET.USE_EGL True (like this).

hwpengTristin commented 1 year ago

Thank you for your detailed explanation. I learnt a lot and appreciated the thoughtful design in the code.

I followed your suggestion and the code now works fine.