Open JoshMSmith44 opened 1 week ago
Yes, your understanding is correct. For open-set detection, our model can perform better than text queries as shown in our experiments. Maybe you need to input more visual examples for the model to learn this visual prompt concept, for example, 8-16 examples performs the best.
Thank you!
My understanding from the paper and decoder code is that the generic object queries become mask/box proposals without interacting with the prompt embeddings. The shared decoder masks out generic queries such that they don't see content queries and therefore don't see any prompt information. Is this correct? In my own tests DINOv is struggling with datasets that aren't very "objecty" like the COCO dataset.
Hello, sorry to bother you. Recently I was trying to reproduce this code with my own dataset and encountered some problems, can I communicate with you, can I ask for a contact information? Thank you!
My understanding from the paper and decoder code is that the generic object queries become mask/box proposals without interacting with the prompt embeddings. The shared decoder masks out generic queries such that they don't see content queries and therefore don't see any prompt information. Is this correct? In my own tests DINOv is struggling with datasets that aren't very "objecty" like the COCO dataset.