-
### Describe the issue
Issue:
We are trying to finetune the model on our dataset.
Currently, we are able to successfully finetune model `lmsys/vicuna-13b-v1.5` using projector weights `llava-v…
-
[paper](https://arxiv.org/abs/2304.08485)
## TL;DR
- **I read this because.. :** llava 1.5를 읽기 위해
- **task :** chatting VLM
- **problem :** chatGPT처럼 multi-modal에서도 instruction-following하…
-
IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model: we are the first work to propose visual instruction tuning with ID reference
-
Hello! I am very interested in your work and see that you release [the weight of Show-o](https://huggingface.co/showlab/show-o-512x512-wo-llava-tuning) before fine-tuning on LLaVA instructional tuning…
-
### Question
1. could you explain the loss of llava 1.5 is higher than llava (I think both pretraining and Visual Instruction Tuning stage), but achieve better result?
2. also, why did the **spike**…
-
[paper](https://arxiv.org/pdf/2310.03744.pdf)
see llava https://github.com/long8v/PTIR/issues/128#issue-1749571159 here
## TL;DR
- **I read this because.. :** aka LLaVA1.5 / ShareGPT4V에서 LL…
-
- Here's the summary of consulting a LLM specialist:
---
- We have an initial thought in #74 as follows:
![image](https://github.com/user-attachments/assets/265a3d7d-0454-4e7b-9c99-a0dd9f9ecf7c…
-
### Describe the issue
Issue:
I am finetuning llava1.5-7B on 8 * A100 40G, and modified bs & accumulation steps accordingly.
The estimated training time is approx. 24h.
What could go wrong?
En…
-
I find `@torch.no_grad()` in CLIPVisionTower.forward(), so it won't flow gradient to CLIP while training.
https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/mo…
-
### Discussion
### LLaVA-Med V1.6: Training a Large Language-and-Vision Assistant for Biomedicine in Two and Half Hours
#### Abstract
Large Language Models (LLMs) have revolutionized natural la…