FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
https://glee-vision.github.io/
MIT License
1.06k stars 82 forks source link

Detail about object detection decoder. #16

Open loveltyoic opened 6 months ago

loveltyoic commented 6 months ago

Hi, there. I believe GLEE is a great work, thanks for open source! I have a question about object detection: what's the input to the decoder when used as a object detector? Does it need to input object query including box position from anchor boxes? If I'm not wrong, in MaskDINO, it will input box position from anchor and mask as object query. So, what's the object query like in GLEE when used as object detector? Looking forward for your reply, thanks a lot!