Hi, thanks for compiling this list! I hope to bring the following works from my team to your attention:
VIMA: General Robot Manipulation with Multimodal Prompts. ICML 2023. https://vimalabs.github.io/ (paper, code, model). This work pre-dates and is related to PaLM-E.
Prismer: A Vision-Language Model with Multi-Modal Experts. An open-source multimodal LLM from NVIDIA that pre-dates GPT-4. https://github.com/NVlabs/prismer (paper, code, model, demo).
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. NeurIPS 2022 Best Paper Award. Large-scale vision-language foundation model and datasets for Minecraft. https://github.com/MineDojo/MineDojo
Hi, thanks for compiling this list! I hope to bring the following works from my team to your attention: