-
Hi authors,
Congrats on the nice and inspiring survey!
Could you include the **EVE** paper on *Multimodal Instruction Tuning*? Thanks in advance.
Title: Unveiling Encoder-Free Vision-Language M…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-6.5.0-35-generic-x86_64-with-glibc2.35
- P…
-
- Here's the summary of consulting a LLM specialist:
---
- We have an initial thought in #74 as follows:
![image](https://github.com/user-attachments/assets/265a3d7d-0454-4e7b-9c99-a0dd9f9ecf7c…
-
Dear authors,
Hello! I have a question regarding the two-stage fine-tuning process described in your work. Could you kindly help me understand how the two stages are connected during training? Specif…
-
### Question
Great work! I saw that both the pre-training and instruction-150K dataset has the token inserted in the same format. I was wondering why during the pre-training stage of feature alignme…
-
执行命名如下
torchrun --nproc_per_node=8 /home/jn/th/work/Multimodal-GPT/mmgpt/train/instruction_finetune.py \
--lm_path /home/jn/th/work/Multimodal-GPT/checkpoints/llama-7b_hf \
--tokenizer_path /ho…
-
Paper: Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
link: https://arxiv.org/pdf/2306.14565.pdf
Name: LRV-Instruction
Focus: Multimodal
Notes: A benchmark to e…
-
I am really inspired of and thank for your nice work.
The question is "Why is text encoder frozen when training?".
When I fine-tune VISTA model using other dataset such as M-BEIR, the results wi…
-
### Model description
LLaMA-VID is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. LLaMA-VID empowers existing frameworks to support…
-
### Discussion
### LLaVA-Med V1.6: Training a Large Language-and-Vision Assistant for Biomedicine in Two and Half Hours
#### Abstract
Large Language Models (LLMs) have revolutionized natural la…