IDEA-Research / T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
https://deepdataspace.com/home
Other
1.98k stars 120 forks source link

about generating visual prompts. #74

Open mengyu212 opened 3 days ago

mengyu212 commented 3 days ago

Thank you for your great job! Recently, we found a problem when we tried to replicate this work. When we use the same generated visual prompts to infer the target image, we found the performance online (https://www.deepdataspace.com/playground/ovp) is better than the running code. We wonder that the process of generating the visual prompts online is the same as the code? (We run customize_embedding.py to generate .safetensors firstly, and then run embedding_inference.py to infer the result.) Looking forward to your reply.

Mountchicken commented 3 days ago

Hi @mengyu212

In T-Rex2 and OVP, the methods for generating visual prompt embeddings are different. In T-Rex2, obtaining visual prompt embeddings only requires forward inference of the model. In OVP, we use an optimization method. We initialize a visual prompt embedding and then train this embedding on the user's provided images, so that it can fit a specific dataset. In short term, one method requires training, while the other does not.

mengyu212 commented 2 days ago

Got it! Thanks for your reply! May I ask if you will open source the OVP code after, it's performance is excellent! Or, the api like T-Rex2, I wanna incorporate this module in my ongoing research.

Mountchicken commented 2 days ago

We have no plan to open-source the code for now, but we may work on the API development for OVP. Please stay tuned!