ManifoldRG / NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
https://discord.gg/brsPnzNd8h
GNU General Public License v3.0
46 stars 10 forks source link

"Real-world" Multi-modal Robotics Control Datasets #48

Closed daniellawson9999 closed 10 months ago

daniellawson9999 commented 1 year ago

Background

The initially discussed dataset proposal focused on MuJoCo, Atari, environments, are useful for research, but are far from real-world environments. Since the release of Gato, there have been several exciting papers which train language-conditioned transformer policies for robotics. For example, a model could be trained to complete generic language tasks within some bounds, such as by language prompts such as "move the green star next to the red block". The datasets and environments used by these papers could be interesting to explore with Neko. The two datasets which I introduce (Language Table, VIMA Bench), are both new, and quite exciting. A resulting model trained on these datasets may be usable for practical robotics tasks, or for those wanting to experiment with language-conditioned robotic control.

Papers and Corresponding Datasets and Environments

Several notable papers include:

RT-1: Robotics Transformer for Real-World Control at Scale

Interactive Language: Talking to Robots in Real Time

VIMA: General Robot Manipulation with Multimodal Prompts

Availability:

Thus, either the VIMA (Bench) or Interactive Language (Bench) could be great environments and datasets for us to incorporate into Gato. Personally, I slightly prefer VIMA to start with, as it seems already to be closely tailored to work with Gato-style tokenization, and has really good documentation, and multi-modal prompting through both language and images, while language table's model input is language.

Output

Separate issues can be created for sourcing issues following this general direction. For general procedure in sourcing a control dataset, refer to meta-issue https://github.com/ManifoldRG/NEKO/issues/13 (converting to Minari). Another example issue for a control dataset is: https://github.com/ManifoldRG/NEKO/issues/12 or https://github.com/ManifoldRG/NEKO/issues/14 .

Feel free to discuss thoughts for this issue here, or create a separate issue for one of these individual datasets which contains more information or tracks progress towards its conversion.

BobakBagheri commented 1 year ago

Issues needs pick up from @snat-s in effort to review datasets of interest Need to also include more recent information to what was happening back in Jun, specifically thinking OpenRX

AshutoshPanda2002 commented 10 months ago

If considering simulation environments, I think the Vista Driving Sim from MIT could be something useful. It "provides an interface for transforming real-world datasets into virtual environments with dynamic agents, sensor suites, and task objectives".

BobakBagheri commented 10 months ago

Closing, dataset reconciliation effort is captured in #56 so refer to this for all dataset related issues