About codes - Githubissues

IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

https://deepdataspace.com/blog/T-Rex

Other

2.28k stars 147 forks source link

About codes #77

Closed mmMm128 closed 4 months ago

mmMm128 commented 4 months ago

Hello, I have some questions regarding the understanding of the code. Is 'category_id' the text prompt? Is it created in the form of a dictionary where numbers represent the specific categories? Additionally, is the input of the visual prompt the image to be detected along with a frame to select the object of interest? I look forward to your reply, thank you very much.

Mountchicken commented 4 months ago

Hi @mmMm128 The category ID is not related to the category name; it is only used to distinguish different visual prompts. For visual prompt detection in interactive mode, the user only needs to input one image, and then we will extract object features from the current image and perform detection on it without relying on additional images.

mmMm128 commented 4 months ago

Hi @mmMm128 The category ID is not related to the category name; it is only used to distinguish different visual prompts. For visual prompt detection in interactive mode, the user only needs to input one image, and then we will extract object features from the current image and perform detection on it without relying on additional images.

Thank you very much for your answer. I have another question. It seems that I haven't seen any content related to text prompts in the code. Could you tell me if the text prompts are reflected in the code?

Mountchicken commented 4 months ago

Text prompt mode is not supported currently. You can try an alternative GroundingDINO 1.5

mmMm128 commented 4 months ago

The category ID is not related to the category name; it is only used to distinguish different visual prompts. For visual prompt detection in interactive mode, the user only needs to input one image, and then we will extract object features from the current image and perform detection on it without relying on additional images.

Okay, thank you very much! I saw that your paper mentioned the integration of visual and text prompts, Is it that the code only contains visual prompts?

Mountchicken commented 4 months ago

The code contain both prompts, but we now only open visual prompt for public access.

mmMm128 commented 4 months ago

The code contain both prompts, but we now only open visual prompt for public access. Okay, thank you again for your answers.