Closed daniellawson9999 closed 10 months ago
Issues needs pick up from @snat-s in effort to review datasets of interest Need to also include more recent information to what was happening back in Jun, specifically thinking OpenRX
If considering simulation environments, I think the Vista Driving Sim from MIT could be something useful. It "provides an interface for transforming real-world datasets into virtual environments with dynamic agents, sensor suites, and task objectives".
Closing, dataset reconciliation effort is captured in #56 so refer to this for all dataset related issues
Background
The initially discussed dataset proposal focused on MuJoCo, Atari, environments, are useful for research, but are far from real-world environments. Since the release of Gato, there have been several exciting papers which train language-conditioned transformer policies for robotics. For example, a model could be trained to complete generic language tasks within some bounds, such as by language prompts such as "move the green star next to the red block". The datasets and environments used by these papers could be interesting to explore with Neko. The two datasets which I introduce (Language Table, VIMA Bench), are both new, and quite exciting. A resulting model trained on these datasets may be usable for practical robotics tasks, or for those wanting to experiment with language-conditioned robotic control.
Papers and Corresponding Datasets and Environments
Several notable papers include:
RT-1: Robotics Transformer for Real-World Control at Scale
Interactive Language: Talking to Robots in Real Time
VIMA: General Robot Manipulation with Multimodal Prompts
Availability:
Thus, either the VIMA (Bench) or Interactive Language (Bench) could be great environments and datasets for us to incorporate into Gato. Personally, I slightly prefer VIMA to start with, as it seems already to be closely tailored to work with Gato-style tokenization, and has really good documentation, and multi-modal prompting through both language and images, while language table's model input is language.
Output
Separate issues can be created for sourcing issues following this general direction. For general procedure in sourcing a control dataset, refer to meta-issue https://github.com/ManifoldRG/NEKO/issues/13 (converting to Minari). Another example issue for a control dataset is: https://github.com/ManifoldRG/NEKO/issues/12 or https://github.com/ManifoldRG/NEKO/issues/14 .
Feel free to discuss thoughts for this issue here, or create a separate issue for one of these individual datasets which contains more information or tracks progress towards its conversion.