WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
https://vip-llava.github.io/
Apache License 2.0
282 stars 21 forks source link

Training Stage #5

Closed 980044579 closed 9 months ago

980044579 commented 9 months ago

Question

Very exciting work, how long do “Visual Instruction Tuning“ and “Finetuning on GPT-4V Instruction Data“ take?