WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
https://vip-llava.github.io/
Apache License 2.0
214 stars 15 forks source link

How to use visual prompts with huggingface? #6

Closed joshmyersdean closed 4 months ago

joshmyersdean commented 4 months ago

Question

Thank you for the great work! What is the preferred way to feed in visual prompts with the huggingface endpoint?

mu-cai commented 4 months ago

Hi Josh, great question!

The simplest approach is to

  1. Use the official huggingface endpoint code https://huggingface.co/llava-hf/vip-llava-13b-hf
  2. Edit/Overlay the image using the visual prompt. Example code can be found here: https://github.com/mu-cai/ViP-LLaVA/blob/main/llava/visual_prompt_generator.py#L270

Let me know if you need further/more-detailed assistance!

joshmyersdean commented 4 months ago

This was very helpful, thank you!

mu-cai commented 3 months ago

Or check out this part in README!

Or use the bounding box as the

which makes visual prompt generation much easier!