IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/blog/T-Rex
Other
2.28k stars 147 forks source link

Difference w.r.t GLEE? #39

Closed ifsheldon closed 7 months ago

ifsheldon commented 8 months ago

Hi! Great work!

It seems a concurrent work, GLEE, is very similar to yours. Some of the reported results in their paper seem to be even better. Can you elaborate the differences or compare the results? Thanks!

Mountchicken commented 8 months ago

Hi @ifsheldon Thanks for your interest in our work. GLEE is a great work but different from T-Rex2. GLEE indeed supports both text prompts and visual prompts as input. However, for visual prompt, GLEE only uses it for interactive segmentation, which is a one-to-one task. (One prompt gets one mask). In T-Rex2, our visual prompt is for generic object detection, which is a one-to-many task (one prompt gets all instances). Also, as for the reported metrics on COCO and LVIS, GLEE uses both COCO and LVIS data for training. In T-Rex2, we evaluate our model on COCO and LVIS in a zero-shot setting, i.e., we do not use them for training. So we can't compare their performance.

ifsheldon commented 7 months ago

Got it! Thanks a lot for the explanation