Open lucasxu777 opened 3 months ago
Hello,I'm a phD student from ZJU, I also use videollama2 to do my own research,we create a WeChat group to discuss some issues of videollama2 and help each other,could you join us? Please contact me: WeChat number == LiangMeng19357260600, phone number == +86 19357260600,e-mail == liangmeng89@zju.edu.cn.
Hi, thanks again for the amazing work here! When I try to fine tune the model with our sample data, I was able to initialize some parts of the training but I got the following issue related to "cpu "in the picture. I ran this on the google colab with one video and a formatted json file to try to make it run first. Please be noted that I actually downloaded these models locally on google colab because otherwise I would not be able to run. You can assume that there is no directory issue here.
---------------------------- finetune_lora.sh code: -----------------------------
finetune_lora.sh code:
fintune_lora.sh !/bin/bash
Environment Variables ARG_WORLD_SIZE=${1:-1} ARG_NPROC_PER_NODE=${2:-1} ARG_MASTER_ADDR="127.0.0.1" ARG_MASTER_PORT=16666 ARG_RANK=0
Multiple conditions if [ ! -n "$WORLD_SIZE" ] || [ ! -n "$NPROC_PER_NODE" ]; then WORLD_SIZE=$ARG_WORLD_SIZE NPROC_PER_NODE=$ARG_NPROC_PER_NODE fi if [ ! -n "$MASTER_ADDR" ] || [ ! -n "$MASTER_PORT" ] || [ ! -n "$RANK" ]; then MASTER_ADDR=$ARG_MASTER_ADDR MASTER_PORT=$ARG_MASTER_PORT RANK=$ARG_RANK fi
echo "WORLD_SIZE: $WORLD_SIZE" echo "NPROC_PER_NODE: $NPROC_PER_NODE"
Training Arguments GLOBAL_BATCH_SIZE=128 LOCAL_BATCH_SIZE=4 GRADIENT_ACCUMULATION_STEPS=$[$GLOBAL_BATCH_SIZE/($WORLD_SIZE$NPROC_PER_NODE$LOCAL_BATCH_SIZE)]
Log Arguments export TRANSFORMERS_OFFLINE=0 # 0 = offline export WANDB_PROJECT=videollama2_vllava RUN_NAME=videollama2_vllava_lora DATA_DIR="/content/drive/My Drive/ColabNotebooks/Video_llm2/videollava_sft" OUTP_DIR="/content/drive/My Drive/ColabNotebooks/Video_llm2/fine_tuning_result"
export CUDA_LAUNCH_BLOCKING=1 export CUDA_VISIBLE_DEVICES=0 export HF_HUB_OFFLINE=0
torchrun --nnodes=1 --nproc_per_node=1 \ --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT \ --node_rank=$RANK /content/VideoLLaMA2/scripts/custom/videollama2/train_flash_attn.py \ --lora_enable True \ --lora_r 128 \ --lora_alpha 256 \ --mm_projector_lr 2e-5 \ --bits 4 \ --deepspeed "/content/VideoLLaMA2/scripts/zero3.json" \ --version mistral \ --vision_tower /content/VideoLLaMA2/scripts/custom/openai/clip-vit-large-patch14-336/ \ --mm_projector_type stc_connector \ --model_name_or_path /content/VideoLLaMA2/Mistral-7B-Instruct-v0.2/ \ --data_path "${DATA_DIR}/videochatgpt_llavaimage_tune.json" \ --data_folder "${DATA_DIR}/video" \ --pretrain_mm_mlp_adapter "/root/.cache/huggingface/hub/models--DAMO-NLP-SG--VideoLLaMA2-7B-Base/snapshots/main/models--DAMO-NLP-SG--VideoLLaMA2-7B-Base/snapshots/610da3cc29bc29e16e44cb3ba340af09da4994eb/mm_projector.bin" \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --num_frames 8 \ --bf16 True \ --tf32 True \ --fp16 False \ --output_dir "${OUTPDIR}/finetune${RUN_NAME}" \ --num_train_epochs 1 \ --per_device_train_batch_size $LOCAL_BATCH_SIZE \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 500 \ --save_total_limit 99 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --report_to tensorboard \ --run_name $RUN_NAME
------------------------------------ full outputs ----------------------------------- I can also cpoy and paste the full outputs which contain the error msg if needed because it is quite long. Here I can show you where I failed and couple lines before that.
---------------------------------- update -------------------------------
Just for your information, I also changed this so that I do not have to train from scratch, but it still gives me the same error from above.
Formatting inputs...Skip in lazy mode Parameter Offload: Total persistent parameters: 1143040 in 387 params 0% 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "/content/VideoLLaMA2/scripts/custom/videollama2/train_flash_attn.py", line 12, in
train(attn_implementation="flash_attention_2")
File "/content/VideoLLaMA2/scripts/custom/./videollama2/train.py", line 1035, in train
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3238, in training_step
loss = self.compute_loss(model, inputs)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3264, in compute_loss
outputs = model(inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/engine.py", line 1842, in forward
loss = self.module(*inputs, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 922, in forward
return self.base_model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(args, kwargs)
File "/content/VideoLLaMA2/scripts/custom/./videollama2/model/language_model/videollama2_mistral.py", line 90, in forward
return super().forward(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1139, in forward
outputs = self.model(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 1014, in forward
layer_outputs = self._gradient_checkpointing_func(
File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 482, in checkpoint
return CheckpointFunction.apply(function, preserve, args)
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 553, in apply
return super().apply(args, kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py", line 261, in forward
outputs = run_function(args)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 738, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1561, in _call_impl
result = forward_call(args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 366, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py", line 159, in apply_rotary_pos_emb
cos = cos[position_ids].unsqueeze(unsqueeze_dim)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
0% 0/1 [00:04<?, ?it/s]
[2024-07-22 19:45:32,893] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 21944) of binary: /usr/bin/python3
Traceback (most recent call last):