The All-Seeing Project

This is the official implementation of the following papers:

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

The name "All-Seeing" is derived from "The All-Seeing Eye", which means having complete knowledge, awareness, or insight into all aspects of existence. The logo is Millennium Puzzle, an artifact from the manga "Yu-Gi-Oh!")

News and Updates 🚀🚀🚀

July 01, 2024: All-Seeing Project v2 is accepted by ECCV 2024! Note that the model and data have already been released in huggingface. Try our model demo here.
Feb 28, 2024: All-Seeing Project v2 is out! Our ASMv2 achieves state-of-the-art performance across a variety of image-level and region-level tasks! See here for more details.
Feb 21, 2024: ASM, AS-Core, AS-10M, AS-100M is released!
Jan 16, 2024: All-Seeing Project is accepted by ICLR 2024!
Aug 29, 2023: All-Seeing Model Demo is available on the OpenXLab now!

Schedule

[x] Release the ASMv2 model.
[x] Release the AS-V2 dataset.
[x] Release the ASM model.
[ ] Release the full version of AS-1B.
[x] Release AS-Core, which is the human-verified subset of AS-1B.
[x] Release AS-100M, which is the 100M subset of AS-1B.
[x] Release AS-10M, which is the 10M subset of AS-1B.
[x] Online demo, including dataset browser and ASM online demo.

Introduction

The All-Seeing Project [Paper][Model][Dataset][Code][Zhihu][Medium]

All-Seeing 1B (AS-1B) dataset: we propose a new large-scale dataset (AS-1B) for open-world panoptic visual recognition and understanding, using an economical semi-automatic data engine that combines the power of off-the-shelf vision/language models and human feedback.

All-Seeing Model (ASM): we develop a unified vision-language foundation model (ASM) for open-world panoptic visual recognition and understanding. Aligning with LLMs, our ASM supports versatile image-text retrieval and generation tasks, demonstrating impressive zero-shot capability.

The All-Seeing Project V2 [Paper][Model][Dataset][Code][Zhihu][Medium]

All-Seeing Dataset V2 (AS-V2) dataset: we propose a novel task, termed Relation Conversation (ReC), which unifies the formulation of text generation, object localization, and relation comprehension. Based on the unified formulation, we construct the AS-V2 dataset, which consists of 127K high-quality relation conversation samples, to unlock the ReC capability for Multi-modal Large Language Models (MLLMs).

All-Seeing Model v2 (ASMv2): we develop ASMv2, which integrates the Relation Conversation ability while maintaining powerful general capabilities. It is endowed with grounding and referring capabilities, exhibiting state-of-the-art performance on region-level tasks. Furthermore, this model can be naturally adapted to the Scene Graph Generation task in an open-ended manner.

Circular-based Relation Probing Evaluation (CRPE) benchmark: We construct a benchmark called Circular-based Relation Probing Evaluation (CRPE), which is the first benchmark that covers all elements of the relation triplets (subject, predicate, object), providing a systematic platform for the evaluation of relation comprehension ability.

License

This project is released under the Apache 2.0 license.

🖊️ Citation

If you find this project useful in your research, please consider cite:

@article{wang2023allseeing,
  title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World},
  author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others},
  journal={arXiv preprint arXiv:2308.01907},
  year={2023}
}
@article{wang2024allseeing_v2,
  title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World},
  author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others},
  journal={arXiv preprint arXiv:2402.19474},
  year={2024}
}

OpenGVLab / all-seeing

readme