visual-instruction-tuning Search Results

haotian-liu/LLaVA #1630

[Usage] Visual instruction tuning for LLaVa 1.6

### Describe the issue Issue: We are trying to finetune the model on our dataset. Currently, we are able to successfully finetune model `lmsys/vicuna-13b-v1.5` using projector weights `llava-v…

mattia-re-learn updated 2 months ago

long8v/PTIR #128

[119] Visual Instruction Tuning

[paper](https://arxiv.org/abs/2304.08485) ## TL;DR - **I read this because.. :** llava 1.5를 읽기 위해 - **task :** chatting VLM - **problem :** chatGPT처럼 multi-modal에서도 instruction-following하…

long8v updated 11 months ago

BradyFU/Awesome-Multimodal-Large-Language-Models #186

May you add IDA-VLM?

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model: we are the first work to propose visual instruction tuning with ID reference

jiyt17 updated 4 weeks ago

showlab/Show-o #44

About checkpoints to be used by finetune

Hello! I am very interested in your work and see that you release [the weight of Show-o](https://huggingface.co/showlab/show-o-512x512-wo-llava-tuning) before fine-tuning on LLaVA instructional tuning…

trmzpi02 updated 1 month ago

haotian-liu/LLaVA #878

[Question] training loss curve

### Question 1. could you explain the loss of llava 1.5 is higher than llava (I think both pretraining and Visual Instruction Tuning stage), but achieve better result? 2. also, why did the **spike**…

bpwl0121 updated 1 month ago

long8v/PTIR #152

[140] Improved Baselines with Visual Instruction Tuning

[paper](https://arxiv.org/pdf/2310.03744.pdf) see llava https://github.com/long8v/PTIR/issues/128#issue-1749571159 here ## TL;DR - **I read this because.. :** aka LLaVA1.5 / ShareGPT4V에서 LL…

long8v updated 9 months ago

xp1632/DFKI_working_log #75

`MLMM: Multi Modal Large Language Model`

- Here's the summary of consulting a LLM specialist: --- - We have an initial thought in #74 as follows: ![image](https://github.com/user-attachments/assets/265a3d7d-0454-4e7b-9c99-a0dd9f9ecf7c…

xp1632 updated 5 days ago

haotian-liu/LLaVA #560

[Usage] Training speed for visual instruction tuning

### Describe the issue Issue: I am finetuning llava1.5-7B on 8 * A100 40G, and modified bs & accumulation steps accordingly. The estimated training time is approx. 24h. What could go wrong? En…

DietDietDiet updated 1 year ago

haotian-liu/LLaVA #1537

Does vistion tower trained during starge 2 (Visual Instructi…

I find `@torch.no_grad()` in CLIPVisionTower.forward(), so it won't flow gradient to CLIP while training. https://github.com/haotian-liu/LLaVA/blob/c121f0432da27facab705978f83c4ada465e46fd/llava/mo…

GoGoJoestar updated 5 months ago

haotian-liu/LLaVA #1551

[Discussion] Request for Guidance on Stage Two: Converting L…

### Discussion ### LLaVA-Med V1.6: Training a Large Language-and-Vision Assistant for Biomedicine in Two and Half Hours #### Abstract Large Language Models (LLMs) have revolutionized natural la…

rohithbojja updated 3 months ago

748 results for visual-instruction-tuning

748 results
for visual-instruction-tuning