Question about the dataset format

pangxincheng commented 10 months ago

Hi, thanks for your great work. I want to deploy your work on my robotic arm, but I am not very clear about the format of the data. Specifically, I have checked the realworld dataset you provided and it has the following format:

{
    "data": {
        "demo_0": {
            "actions": shape (N, 7),
            "obs": {
                "agentview_rgb": shape (N, 128, 128, 3),
                "ee_states": shape (N, 16),
                "eye_in_hand_rgb": shape (N, 128, 128, 3),
                "gripper_states": shape (N, 1),
                "joint_states": shape (N, 7),
            }
        },
        ...
    }
}

I can understand the meanings of agentview_rgb, eye_in_hand_rgb, gripper_states, and joint_states, but I am not clear about the specific meanings represented by each value of actions, and ee_states. Can you explain their meanings? Thanks!

pangxincheng commented 10 months ago

Hi, thanks for your great work. I want to deploy your work on my robotic arm, but I am not very clear about the format of the data. Specifically, I have checked the realworld dataset you provided and it has the following format:
{
    "data": {
        "demo_0": {
            "actions": shape (N, 7),
            "obs": {
                "agentview_rgb": shape (N, 128, 128, 3),
                "ee_states": shape (N, 16),
                "eye_in_hand_rgb": shape (N, 128, 128, 3),
                "gripper_states": shape (N, 1),
                "joint_states": shape (N, 7),
            }
        },
        ...
    }
}
I can understand the meanings of agentview_rgb, eye_in_hand_rgb, gripper_states, and joint_states, but I am not clear about the specific meanings represented by each value of actions, and ee_states. Can you explain their meanings? Thanks!

I have debugged the eval. py, and now I understand ee_states represents the transformation matrix, and actions means xyz+axis angle+gripper.

But when I load the real world data, there are certain differences in the data, Specifically, I find a difference between actions and ee_states. In my understanding, ee_states_{t+1} = ee_states_{t} + actions_{t}, but it does not hold true.

My code is as following:

import h5py
import numpy as np
from scipy.spatial.transform import Rotation as R
import robosuite.utils.transform_utils as trans

def calc_goal_rotmat(delta: np.ndarray, current_orientation: np.ndarray):
    quat_error = trans.axisangle2quat(delta)
    rotation_mat_error = trans.quat2mat(quat_error)
    goal_orientation = np.dot(rotation_mat_error, current_orientation)
    return goal_orientation

def scale_action(action: np.ndarray):
    """
    this function is used when I run the eval.py
    reference from robosuite.controllers.base_controller.Controller.scale_action 
    [robosuite](https://github.com/ARISE-Initiative/robosuite/blob/b9d8d3de5e3dfd1724f4a0e6555246c460407daa/robosuite/controllers/base_controller.py#L104)
    action: np.ndarray shape = (N, 6)
    """
    input_min = np.array([-1., -1., -1., -1., -1., -1.])
    input_max = np.array([1., 1., 1., 1., 1., 1.])
    output_max = np.array([0.05, 0.05, 0.05, 0.5 , 0.5 , 0.5 ])
    output_min = np.array([-0.05, -0.05, -0.05, -0.5 , -0.5 , -0.5 ])
    action_scale = np.abs(output_max - output_min) / np.abs(input_max - input_min)
    action_output_transform = (output_max + output_min) / 2.0
    action_input_transform = (input_max + input_min) / 2.0
    return (action - action_input_transform) * action_scale + action_output_transform

f = h5py.File("RW8_open_the_top_drawer_demo.hdf5", "r")  # load the read world data
actions = np.array(f["data/demo_0/actions"])
actions[:, :-1] = scale_action(actions[:, :-1])  # scale action

ee_states = np.array(f["data/demo_0/obs/ee_states"]).reshape(-1, 4, 4).transpose(0, 2, 1)  # ee pose in base frame (N, 4, 4)
ee_rotmat = ee_states[:, :3, :3]  # (N, 3, 3)

for i in range(ee_rotmat.shape[0] - 1):
    target_rotmat = calc_goal_rotmat(actions[i, 3:6], ee_rotmat[i])
    print("max rot difference:", np.abs(R.from_matrix(target_rotmat).as_rotvec() - R.from_matrix(ee_rotmat[i + 1]).as_rotvec()).max(), end=", ")
    print("max xyz difference:", np.abs(ee_states[i + 1, :3, 3] - (ee_states[i, :3, 3] + actions[i, :3])).max())
f.close()

The output is as follows:

max rot difference: 0.06717731695839074, max xyz difference: 0.026428571428571412
max rot difference: 0.12910619805085455, max xyz difference: 0.017571192556700033
max rot difference: 0.12899400616200962, max xyz difference: 0.016694306660440306
max rot difference: 0.1273171033890601, max xyz difference: 0.014875318641103308
max rot difference: 0.11722738636511609, max xyz difference: 0.01575476722665664
max rot difference: 0.09960029660031305, max xyz difference: 0.020507130148286645
max rot difference: 0.0881736445381789, max xyz difference: 0.024517996159409694
max rot difference: 0.08918299972444184, max xyz difference: 0.02454872393192914
max rot difference: 0.0917978538396374, max xyz difference: 0.025392810946219313
max rot difference: 0.10303586422841801, max xyz difference: 0.023936941918841825
max rot difference: 0.10536312644094174, max xyz difference: 0.023651353628730443
max rot difference: 0.10443637685785037, max xyz difference: 0.01868286831938254
max rot difference: 0.1047047346610528, max xyz difference: 0.015359910272096378
...

I think there may be a problem with the parameters in the function scale_action. Can you help me solve it? @ShahRutav Thanks.

ShahRutav commented 10 months ago

Hi,

Thanks for showing interest in our work. The equation ee_states_{t+1} = ee_states_{t} + actions_{t} might have a relatively higher error since the robot might not fully achieve action_{t} before predicting action for the next time step.

On a side note, we do not use ee_states in our code.

pangxincheng commented 10 months ago

Thanks for resolving my doubts😄. @ShahRutav I have another question to ask you. I would like to ask if the scale of actions also uses the same parameters as robotsuite in the real world? (i.e. the following parameters)

input_min = np.array([-1., -1., -1., -1., -1., -1.])
input_max = np.array([1., 1., 1., 1., 1., 1.])
output_max = np.array([0.05, 0.05, 0.05, 0.5 , 0.5 , 0.5 ])
output_min = np.array([-0.05, -0.05, -0.05, -0.5 , -0.5 , -0.5 ])

It is important for deploying the MUTEX in my robotic arm. Looking forward to your answer.

ShahRutav commented 10 months ago

Hi,

Due to hardware constraints, the action scaling can be very different in simulation and real-world. We use deoxys controller for our real-world experiments. The FrankaInterface class might be of interest.

I can attach the controller configs if that is helpful (as soon as I can access the robot). Let me know if you need it!

UT-Austin-RPL / MUTEX

Question about the dataset format #1