ermongroup / TEOChat

Apache License 2.0
30 stars 0 forks source link

TEOChat: Large Language and Vision Assistant for Temporal Earth Observation Data

If you like our project, please give us a star โญ on GitHub for latest update.
[![hf_space](https://img.shields.io/badge/๐Ÿค—-Open%20In%20Spaces-blue.svg)](https://huggingface.co/spaces/jirvin16/TEOChat) [![arXiv](https://img.shields.io/badge/Arxiv-2410.06234-b31b1b.svg?logo=arXiv)](http://arxiv.org/abs/2410.06234)
[![License](https://img.shields.io/badge/License-Apache%202.0-yellow)](https://github.com/ermongroup/TEOChat/blob/main/LICENSE) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fermongroup%2FTEOChat&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visitor&edge_flat=false)](https://hits.seeyoufarm.com)
## ๐Ÿ“ฐ News * **[2024.10.9]** [Paper](https://arxiv.org/abs/2410.06234) and ๐Ÿค— [Hugging Face demo](https://huggingface.co/spaces/jirvin16/TEOChat) are available! Please feel free to **watch** ๐Ÿ‘€ this repository for the latest updates. ## ๐Ÿ˜ฎ Highlights **TEOChat** is the first language and vision assistant that can engage in conversation about sequences of temporal earth observation imagery, and exhibits impressive performance on multiple temporal instruction-following tasks. ### ๐Ÿ“š TEOChatlas: A new instruction-following dataset for temporal EO data We introduce a new instruction-following dataset for temporal EO data called **TEOChatlas** which we use to train TEOChat. TEOChatlas contains 554,071 examples spanning dozens of temporal instruction-following tasks. ### ๐Ÿค– TEOChat: A new vision-language model for temporal EO data We design TEOChat to use a LLaVA-style architecture, combining a temporally shared vision encoder with a LLaMA 2 LLM connected through an MLP vision-language projector ## ๐Ÿค— Demo ### Gradio Web UI We provide an [online demo](https://huggingface.co/spaces/jirvin16/TEOChat) in Huggingface Spaces. You can also run the demo locally by running the following command: ```bash python videollava/serve/teochat_demo.py ``` ## ๐Ÿš€ Main Results We demonstrate that TEOChat: - outperforms a previous VLM for single EO images ([GeoChat](https://github.com/mbzuai-oryx/geochat)) and a VLM for temporal sequences of natural images ([Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA)), and also rivals specialist models on multiple tasks. - achieves impressive zero-shot performance on an EO change detection and a change QA dataset. - outperforms two strong proprietary foundation models for modeling sequences of images (GPT-4o and Gemini-1.5 Pro). - possesses strong single image capabilities, outperforming GeoChat on multiple zero-shot scene classification and visual question answering tasks. ### Temporal Tasks

### Zero-shot Temporal Tasks and Comparison with Proprietary Foundation Models

### Single Image Tasks

## ๐Ÿ› ๏ธ Requirements and Installation * Python >= 3.9 * Pytorch == 2.2.1 * CUDA Version >= 12.1 * Install required packages: ```bash git clone https://github.com/ermongroup/TEOChat.git cd TEOChat conda create -n teochat python=3.9 -y conda activate teochat pip install --upgrade pip # enable PEP 660 support pip install -r requirements.txt ``` ## ๐Ÿ—๏ธ Training & Validating The training & validating instructions are in [TRAIN_AND_VALIDATE.md](TRAIN_AND_VALIDATE.md). ## ๐Ÿ‘ Acknowledgement * [Video-LLaVA](https://github.com/PKU-YuanGroup/Video-LLaVA) The codebase and model we built upon. * [GeoChat](https://github.com/mbzuai-oryx/geochat) The single image instruction-following dataset we included in TEOChatlas. ## ๐Ÿ”’ License * The majority of this project is released under the Apache 2.0 license as found in the [LICENSE](https://github.com/ermongroup/TEOChat/blob/main/LICENSE) file. * The service is a research preview intended for non-commercial use only, subject to the model [License](https://github.com/facebookresearch/llama/blob/main/MODEL_CARD.md) of LLaMA, [Terms of Use](https://openai.com/policies/terms-of-use) of the data generated by OpenAI, and [Privacy Practices](https://chrome.google.com/webstore/detail/sharegpt-share-your-chatg/daiacboceoaocpibfodeljbdfacokfjb) of ShareGPT. Please contact us if you find any potential violation. ## โœ๏ธ Citation If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:. ```BibTeX @article{irvin2024teochat, title={TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data}, author={Irvin, Jeremy Andrew and Liu, Emily Ruoyu and Chen, Joyce Chuyi and Dormoy, Ines and Kim, Jinyoung and Khanna, Samar and Zheng, Zhuo and Ermon, Stefano}, journal={arXiv preprint arXiv:2410.06234}, year={2024} } ```