How to use visual prompts with huggingface?

WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

https://vip-llava.github.io/

Apache License 2.0

214 stars 15 forks source link

Closed joshmyersdean closed 4 months ago

joshmyersdean commented 4 months ago

Thank you for the great work! What is the preferred way to feed in visual prompts with the huggingface endpoint?

mu-cai commented 4 months ago

Hi Josh, great question!

The simplest approach is to

Let me know if you need further/more-detailed assistance!

joshmyersdean commented 4 months ago

This was very helpful, thank you!

mu-cai commented 3 months ago

Or check out this part in README!

Or use the bounding box as the

which makes visual prompt generation much easier!