Closed YoungjaeDev closed 5 days ago
I think it is not difficult to add loraConfig
, by the way did you get the finetuning script work with its data.yaml
file?https://github.com/LLaVA-VL/LLaVA-NeXT/issues/182#issuecomment-2311573430
@YerongLi
A recent commit seems to have put a yaml file in the script folder
I found it as well. Thanks. I find --lora_enable True \
will enable Lora, they have a lora branch in train.py
I find even with lora, the training script run into OOM with 48 GB memory.
--lora_r =4
, --max_length=128
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
I find even with lora, the training script run into OOM with 48 GB memory.
--lora_r =4
,--max_length=128
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
Is SWIFT not supporting LLAVA-OV LORA fine-tuning?
I find even with lora, the training script run into OOM with 48 GB memory.
--lora_r =4
,--max_length=128
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
Is SWIFT not supporting LLAVA-OV LORA fine-tuning?
Their original code only used the PEFT for Lora sft. Let me try SWIFT. This is very new to me..
@YerongLi Is this done by customizing the dataset, and if so, how did you configure it?
@YerongLi Is this done by customizing the dataset, and if so, how did you configure it?
Which step are you talknig about? I used a subset which their training flow is using, https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
One thing I don't understanding is that with 1B trainable parameters, or even 3 million trainable, why I still run into OOM error.
@YerongLi这是通过自定义数据集完成的吗?如果是,您是如何配置它的?
你指的是哪一步?我使用了他们的训练流程使用的子集,https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
我不明白的一件事是,有了 1B 可训练参数,甚至 300 万个可训练参数,为什么我仍然会遇到 OOM 错误。
I'm still clueless about how to fine-tune LLaVA-onevision. Could you share a training script with me? thanks.
I find even with lora, the training script run into OOM with 48 GB memory.
--lora_r =4
,--max_length=128
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
I used --deepspeed scripts/zero3_offload.json
with LoRA tuning (--lora_r 128 --lora_alpha 256
). The OOM issue is resolved, and training can proceed, but there is an error when saving the final model. I am still working on it. BTW, I train with a single A100_40G GPU.
I find even with lora, the training script run into OOM with 48 GB memory.
--lora_r =4
,--max_length=128
trainable params: 10,811,392 || all params: 8,041,160,224 || trainable%: 0.1344506476532061
I used
--deepspeed scripts/zero3_offload.json
with LoRA tuning (--lora_r 128 --lora_alpha 256
). The OOM issue is resolved, and training can proceed, but there is an error when saving the final model. I am still working on it. BTW, I train with a single A100_40G GPU.
Do we really have to use the zero3_offload.json
, this must be very slow. One thing strange to me is that here our total number of trainables are 1B, I even managed to reduce the number of the trainables to 300M and OOM persists.
I want to train a custom model using LoRA based on the model trained with the finetune_onevision script.
Thank you!