Question on action space alllignment for manipulation dataset

Hi jonathan, Great work with the on extreme-cross-embodiment! The results are very comprehensive.

According to the paper, the action space for each manipulation task is alligned as such that they have a consistent coordinate frame. I believe this line is one example of flipping the y-axis.

Afaik, when flipping the x/y/z-axis, the rotation component will also change, thus needing to flip certain rx/ry/rz of the action values. Not sure if that is considered. :thinking: Also in terms of the frame of reference for the oxe manipulation action data, i believe most are still in terms of base link. Would it be helpful to further transform it to wrist camera frame (end effector frame), since the theme is to try making all observation "egocentric". :thinking: Thanks!

Ahh this was done with a separate data processing script in order not to slow down the dataloading speed. Both the rotations and the Cartesian delta actions were transformed in this script. Let me add that to the repo. The part reversing action dim 1 is an artifact of an older version and should not be there.

I do agree that to fully align all of the OXE datasets, it would make sense to fully transform the actions to the end-effector frame. Note that the wrist camera frame and end-effector frame are similar in most cases, but not exactly the same. This also would help with some of the "partial observability" problems that come from trying to predict delta actions from egocentric camera alone (which I believe is an issue with many egocentric datasets in OXE). However, for robots without the end effector in clear view in the camera frame, this can still cause issues because most if not all of the datasets inside OXE do not have callibrated wrist cameras.

JonathanYang0127 / omnimimic

Question on action space alllignment for manipulation dataset #1