IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/blog/T-Rex
Other
2.28k stars 148 forks source link

bad result on unexpected prompt #8

Closed spacewalk01 closed 12 months ago

spacewalk01 commented 12 months ago

Thank you for your wonderful work. I tried your work on some images and got excellent result on common classes. However, I tried to prompt it with uncommon object like 'rock' but it detects wrong objects (common objects on the image): image

Mountchicken commented 12 months ago

Hi @spacewalk01 We also notice this potential over-fitting issue in T-Rex. When providing only one prompt, T-Rex tends to detect salient objects in the image and ignores the user's prompt. In such case you can provide additional prompts and T-Rex will understand user's intention, for most of the time. we are currently working on improving this. 20291701654562_ pic

spacewalk01 commented 12 months ago

After providing a few more prompts (feedbacks), it was able to fix its output. Nice work! image

spacewalk01 commented 12 months ago

I see. I wonder, so we can provide a few prompts on the first image, but what will happen to next images, should we do the same? I am glad to hear you are working on improving it. I tried on my own training images for detecting people. It did excellent work!

Mountchicken commented 12 months ago

It's hard to tell. In my experience, if the image is complex, i.e. with a lot of objects, then providing more prompts will bring you with higher accuracy.