IDEA-Research / T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/home
Other
1.98k stars 120 forks source link

About the visual prompt #66

Closed fuweifu-vtoo closed 3 weeks ago

fuweifu-vtoo commented 3 weeks ago

In your paper, you mentioned: we randomly choose between one to all available GT boxes to use as visual prompts.

Could the visual prompts selected here be from different categories?

Or do the visual prompts of Trex2 have to come from the same category?

Mountchicken commented 3 weeks ago

Hi @fuweifu-vtoo Each visual prompt embeddings can only come from on category

For instance, if we consider a batch size of 2:

For each category in the first image, we randomly select between 1 to (N) instances to form the visual prompt embeddings. Therefore, for the first image, we will have three visual prompt embeddings corresponding to categories A, B, and C.

Similarly, for the second image, we will have three visual prompt embeddings corresponding to categories D, E, and F.

In symbolic form:

fuweifu-vtoo commented 3 weeks ago

Got it. Thanks.