Closed mengyu212 closed 4 months ago
Hi @mengyu212
In T-Rex2 and OVP, the methods for generating visual prompt embeddings are different. In T-Rex2, obtaining visual prompt embeddings only requires forward inference of the model. In OVP, we use an optimization method. We initialize a visual prompt embedding and then train this embedding on the user's provided images, so that it can fit a specific dataset. In short term, one method requires training, while the other does not.
Got it! Thanks for your reply! May I ask if you will open source the OVP code after, it's performance is excellent! Or, the api like T-Rex2, I wanna incorporate this module in my ongoing research.
We have no plan to open-source the code for now, but we may work on the API development for OVP. Please stay tuned!
Thank you for your great job! Recently, we found a problem when we tried to replicate this work. When we use the same generated visual prompts to infer the target image, we found the performance online (https://www.deepdataspace.com/playground/ovp) is better than the running code. We wonder that the process of generating the visual prompts online is the same as the code? (We run customize_embedding.py to generate .safetensors firstly, and then run embedding_inference.py to infer the result.) Looking forward to your reply.