-
### Describe the issue
Issue:
We are trying to finetune the model on our dataset.
Currently, we are able to successfully finetune model `lmsys/vicuna-13b-v1.5` using projector weights `llava-v…
-
[paper](https://arxiv.org/abs/2304.08485)
## TL;DR
- **I read this because.. :** llava 1.5를 읽기 위해
- **task :** chatting VLM
- **problem :** chatGPT처럼 multi-modal에서도 instruction-following하…
-
I find `@torch.no_grad()` in CLIPVisionTower.forward(), so it won't flow gradient to CLIP while training.
https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/mo…
-
[paper](https://arxiv.org/pdf/2310.03744.pdf)
see llava https://github.com/long8v/PTIR/issues/128#issue-1749571159 here
## TL;DR
- **I read this because.. :** aka LLaVA1.5 / ShareGPT4V에서 LL…
-
### Describe the issue
Issue:
I am finetuning llava1.5-7B on 8 * A100 40G, and modified bs & accumulation steps accordingly.
The estimated training time is approx. 24h.
What could go wrong?
En…
-
# URL
- https://arxiv.org/abs/2310.03744
# Affiliations
- Haotian Liu, N/A
- Chunyuan Li, N/A
- Yuheng Li, N/A
- Yong Jae Lee, N/A
# Abstract
- Large multimodal models (LMM) have recently sh…
-
Hi Haotian,
OOM happened when I ran "finetune.sh" from scripts/v1_5. I used single node A100-40G x8, **without nvlink** to fine-tune a 7B LLaVA-1.5.
The estimated training time is ~24 hours whe…
-
### Question
1. could you explain the loss of llava 1.5 is higher than llava (I think both pretraining and Visual Instruction Tuning stage), but achieve better result?
2. also, why did the **spike**…
-
### Discussion
### LLaVA-Med V1.6: Training a Large Language-and-Vision Assistant for Biomedicine in Two and Half Hours
#### Abstract
Large Language Models (LLMs) have revolutionized natural la…
-
Hello. Thank you for your excellent work.I have some questions about the statements in the paper and hope to receive your answers。In Table 3, you compared the differences between your method and other…