UX-Decoder / DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"
312 stars 10 forks source link

Prompts don't inform mask proposal? #28

Open JoshMSmith44 opened 1 week ago

JoshMSmith44 commented 1 week ago

My understanding from the paper and decoder code is that the generic object queries become mask/box proposals without interacting with the prompt embeddings. The shared decoder masks out generic queries such that they don't see content queries and therefore don't see any prompt information. Is this correct? In my own tests DINOv is struggling with datasets that aren't very "objecty" like the COCO dataset.

FengLi-ust commented 5 days ago

Yes, your understanding is correct. For open-set detection, our model can perform better than text queries as shown in our experiments. Maybe you need to input more visual examples for the model to learn this visual prompt concept, for example, 8-16 examples performs the best.

JoshMSmith44 commented 5 days ago

Thank you!

eisp-tgq commented 3 days ago

My understanding from the paper and decoder code is that the generic object queries become mask/box proposals without interacting with the prompt embeddings. The shared decoder masks out generic queries such that they don't see content queries and therefore don't see any prompt information. Is this correct? In my own tests DINOv is struggling with datasets that aren't very "objecty" like the COCO dataset.

Hello, sorry to bother you. Recently I was trying to reproduce this code with my own dataset and encountered some problems, can I communicate with you, can I ask for a contact information? Thank you!