IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/blog/T-Rex
Other
2.28k stars 147 forks source link

About the Visual Prompt #87

Closed fuweifu-vtoo closed 1 month ago

fuweifu-vtoo commented 2 months ago

Hi, @Mountchicken

Suppose the batch size is set to 2.

In the first image, categories A and B have (N1_A) and (N1_B) instances, respectively. In the second image, categories A and C have (N2_A) and (N2_C) instances, respectively.

Based on your previous answer:

My question:

  1. In issue #85,is the sentence below not a precise description?:

    during training, we generate prompts only within the same image, meaning that the embeddings for objects like dogs and cats are used only within the current image.

  2. In batch training, if the first image contains instances of category C, but is not labeled, and only categories A and B are labeled(Non-exhaustive annotation). According to the above logic, query embedding of category C will be the negative prompts in the first image, will this be a problem?

Mountchicken commented 2 months ago

Hi @fuweifu-vtoo

  1. Indeed, this is not a precise description. During batch training, we will use visual prompts from other images in that batch
  2. This is an inevitable problem in object detection due to the quality of annotated datasets. But the model still works.