About the visual prompt

fuweifu-vtoo commented 5 months ago

In your paper, you mentioned: we randomly choose between one to all available GT boxes to use as visual prompts.

Could the visual prompts selected here be from different categories?

Or do the visual prompts of Trex2 have to come from the same category?

Mountchicken commented 5 months ago

Hi @fuweifu-vtoo Each visual prompt embeddings can only come from on category

For instance, if we consider a batch size of 2:

In the first image, there are three categories: A, B, and C. Assume the instances for each category are (N_A), (N_B), and (N_C) respectively.
In the second image, there are three different categories: D, E, and F, with instances (N_D), (N_E), and (N_F).

For each category in the first image, we randomly select between 1 to (N) instances to form the visual prompt embeddings. Therefore, for the first image, we will have three visual prompt embeddings corresponding to categories A, B, and C.

Similarly, for the second image, we will have three visual prompt embeddings corresponding to categories D, E, and F.

In symbolic form:

For Image 1:
- Category A: Randomly select between 1 to (NA) instances to create visual prompt embedding (V{A})
- Category B: Randomly select between 1 to (NB) instances to create visual prompt embedding (V{B})
- Category C: Randomly select between 1 to (NC) instances to create visual prompt embedding (V{C})
For Image 2:
- Category D: Randomly select between 1 to (ND) instances to create visual prompt embedding (V{D})
- Category E: Randomly select between 1 to (NE) instances to create visual prompt embedding (V{E})
- Category F: Randomly select between 1 to (NF) instances to create visual prompt embedding (V{F})

fuweifu-vtoo commented 5 months ago

Got it. Thanks.