Would like to suggest a few works

DrJimFan commented 1 year ago

Hi, thanks for compiling this list! I hope to bring the following works from my team to your attention:

VIMA: General Robot Manipulation with Multimodal Prompts. ICML 2023. https://vimalabs.github.io/ (paper, code, model). This work pre-dates and is related to PaLM-E.
Prismer: A Vision-Language Model with Multi-Modal Experts. An open-source multimodal LLM from NVIDIA that pre-dates GPT-4. https://github.com/NVlabs/prismer (paper, code, model, demo).
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. NeurIPS 2022 Best Paper Award. Large-scale vision-language foundation model and datasets for Minecraft. https://github.com/MineDojo/MineDojo

BradyFU commented 1 year ago

Thanks for your reminder. The good works have been updated.

DrJimFan commented 1 year ago

Thanks, appreciated!

BradyFU / Awesome-Multimodal-Large-Language-Models