DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
871 stars 60 forks source link

Fine tuning using "finetune_lora.sh" file #57

Open lucasxu777 opened 3 months ago

lucasxu777 commented 3 months ago

Hi, thanks again for the amazing work here! When I try to fine tune the model with our sample data, I was able to initialize some parts of the training but I got the following issue related to "cpu "in the picture. I ran this on the google colab with one video and a formatted json file to try to make it run first. Please be noted that I actually downloaded these models locally on google colab because otherwise I would not be able to run. You can assume that there is no directory issue here.

Screenshot 2024-07-22 at 1 53 15 PM Screenshot 2024-07-22 at 1 53 29 PM

---------------------------- finetune_lora.sh code: -----------------------------

finetune_lora.sh code:

fintune_lora.sh !/bin/bash

Environment Variables ARG_WORLD_SIZE=${1:-1} ARG_NPROC_PER_NODE=${2:-1} ARG_MASTER_ADDR="127.0.0.1" ARG_MASTER_PORT=16666 ARG_RANK=0

Multiple conditions if [ ! -n "$WORLD_SIZE" ] || [ ! -n "$NPROC_PER_NODE" ]; then WORLD_SIZE=$ARG_WORLD_SIZE NPROC_PER_NODE=$ARG_NPROC_PER_NODE fi if [ ! -n "$MASTER_ADDR" ] || [ ! -n "$MASTER_PORT" ] || [ ! -n "$RANK" ]; then MASTER_ADDR=$ARG_MASTER_ADDR MASTER_PORT=$ARG_MASTER_PORT RANK=$ARG_RANK fi

echo "WORLD_SIZE: $WORLD_SIZE" echo "NPROC_PER_NODE: $NPROC_PER_NODE"

Training Arguments GLOBAL_BATCH_SIZE=128 LOCAL_BATCH_SIZE=4 GRADIENT_ACCUMULATION_STEPS=$[$GLOBAL_BATCH_SIZE/($WORLD_SIZE$NPROC_PER_NODE$LOCAL_BATCH_SIZE)]

Log Arguments export TRANSFORMERS_OFFLINE=0 # 0 = offline export WANDB_PROJECT=videollama2_vllava RUN_NAME=videollama2_vllava_lora DATA_DIR="/content/drive/My Drive/ColabNotebooks/Video_llm2/videollava_sft" OUTP_DIR="/content/drive/My Drive/ColabNotebooks/Video_llm2/fine_tuning_result"

export CUDA_LAUNCH_BLOCKING=1 export CUDA_VISIBLE_DEVICES=0 export HF_HUB_OFFLINE=0

torchrun --nnodes=1 --nproc_per_node=1 \ --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT \ --node_rank=$RANK /content/VideoLLaMA2/scripts/custom/videollama2/train_flash_attn.py \ --lora_enable True \ --lora_r 128 \ --lora_alpha 256 \ --mm_projector_lr 2e-5 \ --bits 4 \ --deepspeed "/content/VideoLLaMA2/scripts/zero3.json" \ --version mistral \ --vision_tower /content/VideoLLaMA2/scripts/custom/openai/clip-vit-large-patch14-336/ \ --mm_projector_type stc_connector \ --model_name_or_path /content/VideoLLaMA2/Mistral-7B-Instruct-v0.2/ \ --data_path "${DATA_DIR}/videochatgpt_llavaimage_tune.json" \ --data_folder "${DATA_DIR}/video" \ --pretrain_mm_mlp_adapter "/root/.cache/huggingface/hub/models--DAMO-NLP-SG--VideoLLaMA2-7B-Base/snapshots/main/models--DAMO-NLP-SG--VideoLLaMA2-7B-Base/snapshots/610da3cc29bc29e16e44cb3ba340af09da4994eb/mm_projector.bin" \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --num_frames 8 \ --bf16 True \ --tf32 True \ --fp16 False \ --output_dir "${OUTPDIR}/finetune${RUN_NAME}" \ --num_train_epochs 1 \ --per_device_train_batch_size $LOCAL_BATCH_SIZE \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 500 \ --save_total_limit 99 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --report_to tensorboard \ --run_name $RUN_NAME

------------------------------------ full outputs ----------------------------------- I can also cpoy and paste the full outputs which contain the error msg if needed because it is quite long. Here I can show you where I failed and couple lines before that.

Screenshot 2024-07-22 at 1 55 50 PM

---------------------------------- update -------------------------------

Just for your information, I also changed this so that I do not have to train from scratch, but it still gives me the same error from above.

Screenshot 2024-07-22 at 2 49 09 PM

Formatting inputs...Skip in lazy mode Parameter Offload: Total persistent parameters: 1143040 in 387 params 0% 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/VideoLLaMA2/scripts/custom/videollama2/train_flash_attn.py", line 12, in train(attn_implementation="flash_attention_2") File "/content/VideoLLaMA2/scripts/custom/./videollama2/train.py", line 1035, in train trainer.train() File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2216, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3238, in training_step loss = self.compute_loss(model, inputs) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3264, in compute_loss outputs = model(inputs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1842, in forward loss = self.module(*inputs, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 922, in forward return self.base_model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(args, kwargs) File "/content/VideoLLaMA2/scripts/custom/./videollama2/model/language_model/videollama2_mistral.py", line 90, in forward return super().forward( File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1139, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1014, in forward layer_outputs = self._gradient_checkpointing_func( File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 489, in _fn return fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 482, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 553, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 261, in forward outputs = run_function(args) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 738, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl result = forward_call(args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 366, in forward query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids) File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 159, in apply_rotary_pos_emb cos = cos[position_ids].unsqueeze(unsqueeze_dim) RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) 0% 0/1 [00:04<?, ?it/s] [2024-07-22 19:45:32,893] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 21944) of binary: /usr/bin/python3 Traceback (most recent call last):

LiangMeng89 commented 2 days ago

Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.