IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/blog/T-Rex
Other
2.28k stars 147 forks source link

about generating visual prompts. #74

Closed mengyu212 closed 4 months ago

mengyu212 commented 4 months ago

Thank you for your great job! Recently, we found a problem when we tried to replicate this work. When we use the same generated visual prompts to infer the target image, we found the performance online (https://www.deepdataspace.com/playground/ovp) is better than the running code. We wonder that the process of generating the visual prompts online is the same as the code? (We run customize_embedding.py to generate .safetensors firstly, and then run embedding_inference.py to infer the result.) Looking forward to your reply.

Mountchicken commented 4 months ago

Hi @mengyu212

In T-Rex2 and OVP, the methods for generating visual prompt embeddings are different. In T-Rex2, obtaining visual prompt embeddings only requires forward inference of the model. In OVP, we use an optimization method. We initialize a visual prompt embedding and then train this embedding on the user's provided images, so that it can fit a specific dataset. In short term, one method requires training, while the other does not.

mengyu212 commented 4 months ago

Got it! Thanks for your reply! May I ask if you will open source the OVP code after, it's performance is excellent! Or, the api like T-Rex2, I wanna incorporate this module in my ongoing research.

Mountchicken commented 4 months ago

We have no plan to open-source the code for now, but we may work on the API development for OVP. Please stay tuned!