droid-dataset / droid

Distributed Robot Interaction Dataset.
https://droid-dataset.github.io/droid/
96 stars 15 forks source link

Converting cartesian velocities into cartesian positions #25

Open brunomachado37 opened 2 days ago

brunomachado37 commented 2 days ago

Hello,

First of all, thank you for the amazing open source work.

I was trying to convert different action types present in the dataset, specifically given a commanded cartesian velocity ["action_dict"]["cartesian_velocity"] and the robot state ["observation"]["cartesian_position"], compute the commanded cartesian position ["action_dict"]["cartesian_position"]. This can be particularly useful when training a model to predict velocities, but performing position control.

If I understood correctly, all the different actions in the "actions_dict" are computed from the cartesian_position, isn't that so ?

If yes, we should be able to re-compute one action from the other and vice-versa, but I wasn't able to do so.

I've tried to compute the velocities from the positions, basing myself in the code on: https://github.com/droid-dataset/droid/blob/ca54513dc51d3305aa00f72bb1533aa7a7fba59f/droid/controllers/oculus_controller.py#L124 but the results I get are not coherent with the dataset.

I've also try to go the other way around and compute the positions from the velocities, using the code on: https://github.com/droid-dataset/droid/blob/ca54513dc51d3305aa00f72bb1533aa7a7fba59f/droid/franka/robot.py#L180 https://github.com/droid-dataset/droid/blob/ca54513dc51d3305aa00f72bb1533aa7a7fba59f/droid/robot_ik/robot_ik_solver.py#L8 but again, the results I got are not consistent with the dataset.

To summarize:

AlexanderKhazatsky commented 2 days ago

Okay so I think the one incorrect assumption here is that the oculus controller file should have something critical to do with calculating cartesian velocity / position. The only critical part it serves is the outputs, which are in a valid action space. That file converts hand movements (which mean nothing in reference to control) into cartesian velocities (which does mean something for control).

create_action_dict should have all the required info to go from one action space to another. This includes information about the exact robot state at the time, which can still also be found in the action_dict under the key "robot_state".

To clarify what you need also, it seems that the main reason you need this is to "get the absolute end-effector position, given a initial position and a cartesian_velocity". When collecting data, we used current position and a commanded cartesian_velocity, and processed this through the create_action_dict function, which should contain "cartesian_velocity", an equivalent "cartesian_position", and the current state at the time under "robot_state".

I think my confusion is that what you're asking for should be happening for you and provided for you already, is there a reason that what the code is providing you isn't sufficient? Or are you saying the values aren't accurate?

brunomachado37 commented 1 day ago

Thank you for your fast reply.

Indeed, the code is already supplied with the create_action_dict function, but the values I get are not consistent with the values in the dataset.

Consider that for a random episode from the dataset, I want to compute the actions in all different action spaces, given the cartesian velocity (["action_dict"]["cartesian_velocity"]) and the robot state (["observation"]):

dataset = tfds.load("droid", data_dir=".", split="train").take(1)

for episode in dataset:
    for step in episode["steps"]:
        robot_state = {k: v.numpy() for k, v in step["observation"].items() if 'position' in k}
        action = np.append(step["action_dict"]["cartesian_velocity"].numpy(), step["action_dict"]["gripper_position"].numpy())

        action_dict = create_action_dict(action, action_space="cartesian_velocity", gripper_action_space=None, robot_state=robot_state)

        if np.isclose(action_dict['cartesian_position'], step['action_dict']['cartesian_position'].numpy()).all():
            print('Computed cartesian_position matches tfds one')

Besides that for some steps the values match (given a precision of 3 or 4 decimal points), the overall results are inconsistent.

I've tried to run both trajectories (using the provided cartesian_position and the cartesian_position computed from the cartesian_velocity) on a Franka Panda, and error is high enough to make the trajectories fairly different.

The conversion in question only uses the cartesian_velocity_to_delta function (ignoring the gripper) which relies on the max_lin_delta and max_rot_delta values. I tested the different values on the version control files (droid/misc/version_control/), but neither gives positive results.

So, my question is, if the different action spaces on the dataset were computed from the "cartesian_velocity" action space and the robot proprioception using the create_action_dict function, shouldn't those values match?

I believe I am making some wrong assumptions, but I couldn't figure it out where.

AlexanderKhazatsky commented 10 hours ago

One incorrect assumption here is that your "robot_state" value, which you are getting from the observation is what was passed into "create_action_dict". In practice, the robot_state used during data collection was called at a different time (off by maybe 30 milliseconds), which can lead to noticeable drifts, especially over time. What you would want to do instead is get robot_state value from step["action_dict"]["robot_state"], and pass that in instead. Can you let me know if that causes any difference?