ARISE-Initiative / robomimic

robomimic: A Modular Framework for Robot Learning from Demonstration
MIT License
655 stars 201 forks source link

How to calculate the action according to the existing data in the hdf5 file? #141

Closed Junxix closed 8 months ago

Junxix commented 8 months ago

Hi, I'm wondering how to calculate the action between state[i] and state[i+2] according to the existing data in the hdf5 file? For example,in the task can(PH), I have downloaded the relevant hdf5 file from your website and I also know the action between each state and the various state parameters in hdf5. Can you tell me how the actions between two adjacent states are computed once we know the _end_effectorpos and _end_effectorquat for each state? What if i want to calculate the action between state[i] and state[i+2]? Thanks!

amandlek commented 8 months ago

In general this isn't possible, unless an exact model of the system dynamics is known. The action corresponds to an end effector pose target sent to the controller, which then results in an updated state of the robot, as well as the other objects in the scene. The robot is not guaranteed to reach the target given by the controller.

Junxix commented 8 months ago

I'm sorry I don't know how this action is calculated. So given the _end_effectorpos and _end_effectorquat, how to calculate the action?

>>> import h5py
>>> dp = 'robomimic/can/mh/low_dim_v141_saved.hdf5'
>>> f = h5py.File(dp, 'r')   
>>> print(f["data/demo_1/obs/"].keys())
<KeysViewHDF5 ['object', 'robot0_eef_pos', 'robot0_eef_quat', 'robot0_eef_vel_ang', 'robot0_eef_vel_lin', 'robot0_gripper_qpos', 'robot0_gripper_qvel', 'robot0_joint_pos', 'robot0_joint_pos_cos', 'robot0_joint_pos_sin', 'robot0_joint_vel']>
>>> print(f["data/demo_1/obs/robot0_eef_pos"][:2])
[[-0.03935597 -0.0766327   1.02109479]
 [-0.04047182 -0.0808687   1.0188843 ]]
>>> print(f["data/demo_1/obs/robot0_eef_quat"][:2])
[[ 0.99789604  0.02813489  0.05833099 -0.00306922]
 [ 0.99824317  0.02196068  0.05499304 -0.00201769]]
>>> print(f["data/demo_1/actions"][:1])
[[ 0.024      -0.031      -0.055      -0.00403383  0.1117506  -0.05927867
  -1.        ]]
amandlek commented 8 months ago

Reiterating my previous reply, you cannot do this. The actions were collected during teleoperation and sent to the simulator at each timestep, resulting in the next state of the world. You are asking to be able to infer the action from subsequent observations - we do not have an analytical way to do this.