Closed ifsheldon closed 7 months ago
Hi @ifsheldon Thanks for your interest in our work. GLEE is a great work but different from T-Rex2. GLEE indeed supports both text prompts and visual prompts as input. However, for visual prompt, GLEE only uses it for interactive segmentation, which is a one-to-one task. (One prompt gets one mask). In T-Rex2, our visual prompt is for generic object detection, which is a one-to-many task (one prompt gets all instances). Also, as for the reported metrics on COCO and LVIS, GLEE uses both COCO and LVIS data for training. In T-Rex2, we evaluate our model on COCO and LVIS in a zero-shot setting, i.e., we do not use them for training. So we can't compare their performance.
Got it! Thanks a lot for the explanation
Hi! Great work!
It seems a concurrent work, GLEE, is very similar to yours. Some of the reported results in their paper seem to be even better. Can you elaborate the differences or compare the results? Thanks!