Prompts don't inform mask proposal?

JoshMSmith44 commented 1 week ago

My understanding from the paper and decoder code is that the generic object queries become mask/box proposals without interacting with the prompt embeddings. The shared decoder masks out generic queries such that they don't see content queries and therefore don't see any prompt information. Is this correct? In my own tests DINOv is struggling with datasets that aren't very "objecty" like the COCO dataset.

FengLi-ust commented 5 days ago

Yes, your understanding is correct. For open-set detection, our model can perform better than text queries as shown in our experiments. Maybe you need to input more visual examples for the model to learn this visual prompt concept, for example, 8-16 examples performs the best.

JoshMSmith44 commented 5 days ago

Thank you!

eisp-tgq commented 3 days ago

My understanding from the paper and decoder code is that the generic object queries become mask/box proposals without interacting with the prompt embeddings. The shared decoder masks out generic queries such that they don't see content queries and therefore don't see any prompt information. Is this correct? In my own tests DINOv is struggling with datasets that aren't very "objecty" like the COCO dataset.

Hello, sorry to bother you. Recently I was trying to reproduce this code with my own dataset and encountered some problems, can I communicate with you, can I ask for a contact information? Thank you！

UX-Decoder / DINOv

Prompts don't inform mask proposal? #28