Open pangxincheng opened 10 months ago
Hi, thanks for your great work. I want to deploy your work on my robotic arm, but I am not very clear about the format of the data. Specifically, I have checked the realworld dataset you provided and it has the following format:
{ "data": { "demo_0": { "actions": shape (N, 7), "obs": { "agentview_rgb": shape (N, 128, 128, 3), "ee_states": shape (N, 16), "eye_in_hand_rgb": shape (N, 128, 128, 3), "gripper_states": shape (N, 1), "joint_states": shape (N, 7), } }, ... } }
I can understand the meanings of
agentview_rgb
,eye_in_hand_rgb
,gripper_states
, andjoint_states
, but I am not clear about the specific meanings represented by each value ofactions
, andee_states
. Can you explain their meanings? Thanks!
I have debugged the eval. py
, and now I understand ee_states
represents the transformation matrix, and actions
means xyz+axis angle+gripper
.
But when I load the real world data, there are certain differences in the data, Specifically, I find a difference between actions
and ee_states
. In my understanding, ee_states_{t+1} = ee_states_{t} + actions_{t}
, but it does not hold true.
My code is as following:
import h5py
import numpy as np
from scipy.spatial.transform import Rotation as R
import robosuite.utils.transform_utils as trans
def calc_goal_rotmat(delta: np.ndarray, current_orientation: np.ndarray):
quat_error = trans.axisangle2quat(delta)
rotation_mat_error = trans.quat2mat(quat_error)
goal_orientation = np.dot(rotation_mat_error, current_orientation)
return goal_orientation
def scale_action(action: np.ndarray):
"""
this function is used when I run the eval.py
reference from robosuite.controllers.base_controller.Controller.scale_action
[robosuite](https://github.com/ARISE-Initiative/robosuite/blob/b9d8d3de5e3dfd1724f4a0e6555246c460407daa/robosuite/controllers/base_controller.py#L104)
action: np.ndarray shape = (N, 6)
"""
input_min = np.array([-1., -1., -1., -1., -1., -1.])
input_max = np.array([1., 1., 1., 1., 1., 1.])
output_max = np.array([0.05, 0.05, 0.05, 0.5 , 0.5 , 0.5 ])
output_min = np.array([-0.05, -0.05, -0.05, -0.5 , -0.5 , -0.5 ])
action_scale = np.abs(output_max - output_min) / np.abs(input_max - input_min)
action_output_transform = (output_max + output_min) / 2.0
action_input_transform = (input_max + input_min) / 2.0
return (action - action_input_transform) * action_scale + action_output_transform
f = h5py.File("RW8_open_the_top_drawer_demo.hdf5", "r") # load the read world data
actions = np.array(f["data/demo_0/actions"])
actions[:, :-1] = scale_action(actions[:, :-1]) # scale action
ee_states = np.array(f["data/demo_0/obs/ee_states"]).reshape(-1, 4, 4).transpose(0, 2, 1) # ee pose in base frame (N, 4, 4)
ee_rotmat = ee_states[:, :3, :3] # (N, 3, 3)
for i in range(ee_rotmat.shape[0] - 1):
target_rotmat = calc_goal_rotmat(actions[i, 3:6], ee_rotmat[i])
print("max rot difference:", np.abs(R.from_matrix(target_rotmat).as_rotvec() - R.from_matrix(ee_rotmat[i + 1]).as_rotvec()).max(), end=", ")
print("max xyz difference:", np.abs(ee_states[i + 1, :3, 3] - (ee_states[i, :3, 3] + actions[i, :3])).max())
f.close()
The output is as follows:
max rot difference: 0.06717731695839074, max xyz difference: 0.026428571428571412
max rot difference: 0.12910619805085455, max xyz difference: 0.017571192556700033
max rot difference: 0.12899400616200962, max xyz difference: 0.016694306660440306
max rot difference: 0.1273171033890601, max xyz difference: 0.014875318641103308
max rot difference: 0.11722738636511609, max xyz difference: 0.01575476722665664
max rot difference: 0.09960029660031305, max xyz difference: 0.020507130148286645
max rot difference: 0.0881736445381789, max xyz difference: 0.024517996159409694
max rot difference: 0.08918299972444184, max xyz difference: 0.02454872393192914
max rot difference: 0.0917978538396374, max xyz difference: 0.025392810946219313
max rot difference: 0.10303586422841801, max xyz difference: 0.023936941918841825
max rot difference: 0.10536312644094174, max xyz difference: 0.023651353628730443
max rot difference: 0.10443637685785037, max xyz difference: 0.01868286831938254
max rot difference: 0.1047047346610528, max xyz difference: 0.015359910272096378
...
I think there may be a problem with the parameters in the function scale_action
.
Can you help me solve it? @ShahRutav
Thanks.
Hi,
Thanks for showing interest in our work. The equation ee_states_{t+1} = ee_states_{t} + actions_{t}
might have a relatively higher error since the robot might not fully achieve action_{t}
before predicting action for the next time step.
On a side note, we do not use ee_states
in our code.
Thanks for resolving my doubts😄. @ShahRutav I have another question to ask you. I would like to ask if the scale of actions also uses the same parameters as robotsuite in the real world? (i.e. the following parameters)
input_min = np.array([-1., -1., -1., -1., -1., -1.])
input_max = np.array([1., 1., 1., 1., 1., 1.])
output_max = np.array([0.05, 0.05, 0.05, 0.5 , 0.5 , 0.5 ])
output_min = np.array([-0.05, -0.05, -0.05, -0.5 , -0.5 , -0.5 ])
It is important for deploying the MUTEX in my robotic arm. Looking forward to your answer.
Hi,
Due to hardware constraints, the action scaling can be very different in simulation and real-world. We use deoxys controller for our real-world experiments. The FrankaInterface class might be of interest.
I can attach the controller configs if that is helpful (as soon as I can access the robot). Let me know if you need it!
Hi, thanks for your great work. I want to deploy your work on my robotic arm, but I am not very clear about the format of the data. Specifically, I have checked the realworld dataset you provided and it has the following format:
I can understand the meanings of
agentview_rgb
,eye_in_hand_rgb
,gripper_states
, andjoint_states
, but I am not clear about the specific meanings represented by each value ofactions
, andee_states
. Can you explain their meanings? Thanks!