-
When I run
`bash scripts/video/demo/video_demo.sh ${the path of LLaVA-NeXT-Video-7B-DPO} vicuna_v1 32 2 True ${the path of video}`
I get the error
```
Can't set vocab_size with value 32000 for …
-
### The model to consider.
The llava-next-video project has already been released, and the test results are quite good. Are there any plans to support this project?
`https://github.com/LLaVA-VL/LLaV…
-
Hi Team,
I saw that LLaVA-NeXT-Video-32B-Qwen obtains 77.31%, 63% accuracy on NeXT-QA and Egoschema here: https://huggingface.co/lmms-lab/LLaVA-NeXT-Video-32B-Qwen.
On the other hand, LLaVA-NeXT…
-
Great work! I notice the LLaVA-NeXT-Qwen2 (image model) can achieve a surprising 49.5 Video-MME results. In contrast, the LLaVA-NeXT-Video (Llama3) can only achieve a 30+ Video-MME score (according to…
-
I cloned the "lmms-lab/LLaVA-NeXT-Interleave-Bench" dataset and "llava-onevision-qwen2-7b-ov" checkpoint from Huggingface to reproduce the results of the paper, but some benchmark results seem to be v…
-
I tested the batch inference results of the llava and llava-next-video models using tensorrt-llm based on the examples/multimodal/run.py file. The parameters for their generate method are the same, as…
-
Hi, thanks for your great work. I was wondering the how many gpus are needed to training llava-next with 72b llm.
-
Hi this is really a nice work that shows potential on embedding anything using LLMs.
In section 3.1, you explained that by a summary prompt, both vision and text can be embedded into next token. A…
-
Hello,
I am very interested in your work and was wondering when you might be providing the training scripts and pre-trained models.
Thank you!
-
## タイトル: LLaVA-OneVision: 簡単な視覚タスク転移
## リンク: https://arxiv.org/abs/2408.03326
## 概要:
LLaVA-NeXT ブログシリーズで得られたデータ、モデル、視覚表現に関する知見を統合し、オープンな大規模マルチモーダルモデル (LMM) のファミリーである LLaVA-OneVision を開発しました。実験の結果、…