IDEA-Research / T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/home
Other
1.98k stars 120 forks source link
interactive object-counting object-detection open-set text-prompt visual-prompt

A picture speaks volumes, as do the words that frame it.

![Static Badge](https://img.shields.io/badge/T--Rex-2-2) [![arXiv preprint](https://img.shields.io/badge/arxiv_2403.14610-blue%3Flog%3Darxiv)](https://arxiv.org/pdf/2403.14610.pdf) [![Homepage](https://img.shields.io/badge/homepage-visit-blue)](https://deepdataspace.com/home) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FMountchicken%2FT-Rex&count_bg=%2379C83D&title_bg=%23DF9B9B&icon=iconify.svg&icon_color=%23FFF9F9&title=VISITORS&edge_flat=false)](https://hits.seeyoufarm.com) [![Static Badge](https://img.shields.io/badge/Try_Demo!-blue?logo=chainguard&logoColor=green)](https://deepdataspace.com/playground/ivp)

๐Ÿ“Œ If you find our project helpful and need more API token quotas, you can request additional tokens by filling out this form. Our team will review your request and allocate more tokens for your use in one or two days. You can also apply for more tokens by sending us an email.


Introduction Video ๐ŸŽฅ

Turn on the music if possible ๐ŸŽง

Video Name

News ๐Ÿ“ฐ

Video Name

Video Name

Contents ๐Ÿ“œ

1. Introduction ๐Ÿ“š

Object detection, the ability to locate and identify objects within an image, is a cornerstone of computer vision, pivotal to applications ranging from autonomous driving to content moderation. A notable limitation of traditional object detection models is their closed-set nature. These models are trained on a predetermined set of categories, confining their ability to recognize only those specific categories. The training process itself is arduous, demanding expert knowledge, extensive datasets, and intricate model tuning to achieve desirable accuracy. Moreover, the introduction of a novel object category, exacerbates these challenges, necessitating the entire process to be repeated.

T-Rex2 addresses these limitations by integrating both text and visual prompts in one model, thereby harnessing the strengths of both modalities. The synergy of text and visual prompts equips T-Rex2 with robust zero-shot capabilities, making it a versatile tool in the ever-changing landscape of object detection.

What Can T-Rex Do ๐Ÿ“

T-Rex2 is well-suited for a variety of real-world applications, including but not limited to: agriculture, industry, livstock and wild animals monitoring, biology, medicine, OCR, retail, electronics, transportation, logistics, and more. T-Rex2 mainly supports three major workflows including interactive visual prompt workflow, generic visual prompt workflow and text prompt workflow. It can cover most of the application scenarios that require object detection

Video Name

2. Try Demo ๐ŸŽฎ

We are now opening online demo for T-Rex2. Check our demo here

3. API Usage Examples๐Ÿ“š

We are now opening free API access to T-Rex2. For educators, students, and researchers, we offer an API with extensive usage times to support your educational and research endeavors. You can get API at here request API.

Setup

Install the API package and acquire the API token from the email.

git clone https://github.com/IDEA-Research/T-Rex.git
cd T-Rex
pip install dds-cloudapi-sdk==0.1.1
pip install -v -e .

Interactive Visual Prompt API

Generic Visual Prompt API

Customize Visual Prompt Embedding API

In this workflow, you cam customize a visual embedding for a object category using multiple images. With this embedding, you can detect on any images.

  python demo_examples/customize_embedding.py --token <your_token> 

Embedding Inference API

With the visual prompt embeddings generated from the previous API. You can use it detect on any images.

    python demo_examples/embedding_inference.py --token <your_token> 

4. Local Gradio Demo with API๐ŸŽจ

4.1. Setup

4.2. Run the Gradio Demo

python gradio_demo.py --trex2_api_token <your_token>

4.3. Basic Operations

5. Related Works

:fire: We release the training and inference code and demo link of DINOv, which can handle in-context visual prompts for open-set and referring detection & segmentation. Check it out!

BibTeX ๐Ÿ“š

@misc{jiang2024trex2,
      title={T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy}, 
      author={Qing Jiang and Feng Li and Zhaoyang Zeng and Tianhe Ren and Shilong Liu and Lei Zhang},
      year={2024},
      eprint={2403.14610},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}