IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/blog/T-Rex
Other
2.28k stars 148 forks source link

contrastive embedding #59

Closed lime-s closed 6 months ago

lime-s commented 6 months ago

In your paper, after extracting features from Visual Prompt Encoder and Image Encoder, model will compute the similarity between the encoder feature and the prompt embeddings. The prompt embeddings here is V that calculated in Visual Prompt Encoder? You wrote V = FFN (Selfattn (q ′)) [− 1] in the Visual Prompt Encoder section of the paper. Is it equal to C'+B '? And what is its size?

Mountchicken commented 6 months ago

The size for each visual prompt is torch.size(1, hidden_dim)