UMass-Foundation-Model / 3D-VLA

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
https://vis-www.cs.umass.edu/3dvla/
342 stars 13 forks source link

robot action coordinates #2

Open Todibo99 opened 7 months ago

Todibo99 commented 7 months ago

Hi, thanks for this inspiring work !!! I just want to know when training the action prediction, did you unify the coordinate systems for different datasets from openx?

Thanks!

anyeZHY commented 7 months ago

We haven't unified all the coordinate systems, but instead, we only normalized them like OpenX.

We spent a lot of resources initially trying to align their action spaces. But because we lack camera calibration, robot arm position, etc., we couldn't get correct transformations. The work at https://extreme-cross-embodiment.github.io simply attempts to make action sequences from different datasets consistent by flipping the x/y/z axes, but this still fails to align the coordinates.