Closed deepakHonakeri05 closed 4 months ago
Could you please share your full scripts. This bug is usually caused by loading mistralai/Mistral-7B-Instruct-v0.2
via Videollama2LlamaForCausalLM
.
This is the fine-tuning script I'm using
`#!/bin/bash
ARG_WORLD_SIZE=${1:-1} ARG_NPROC_PER_NODE=${2:-8} ARG_MASTER_ADDR="127.0.0.1" ARG_MASTER_PORT=16666 ARG_RANK=0
if [ ! -n "$WORLD_SIZE" ] || [ ! -n "$NPROC_PER_NODE" ]; then WORLD_SIZE=$ARG_WORLD_SIZE NPROC_PER_NODE=$ARG_NPROC_PER_NODE fi if [ ! -n "$MASTER_ADDR" ] || [ ! -n "$MASTER_PORT" ] || [ ! -n "$RANK" ]; then MASTER_ADDR=$ARG_MASTER_ADDR MASTER_PORT=$ARG_MASTER_PORT RANK=$ARG_RANK fi
echo "WORLD_SIZE: $WORLD_SIZE" echo "NPROC_PER_NODE: $NPROC_PER_NODE"
GLOBAL_BATCH_SIZE=128 LOCAL_BATCH_SIZE=4 GRADIENT_ACCUMULATION_STEPS=$[$GLOBAL_BATCH_SIZE/($WORLD_SIZE$NPROC_PER_NODE$LOCAL_BATCH_SIZE)]
export TRANSFORMERS_OFFLINE=1 export WANDB_PROJECT=videollama2_vllava RUN_NAME=videollama2_vllava_lora DATA_DIR=datasets OUTP_DIR=work_dirs
torchrun --nnodes $WORLD_SIZE \ --nproc_per_node $NPROC_PER_NODE \ --master_addr=$MASTER_ADDR \ --master_port=$MASTER_PORT \ --node_rank $RANK \ videollama2/train_flash_attn.py \ --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \ --deepspeed scripts/zero3.json \ --version v1_mistral \ --vision_tower openai/clip-vit-large-patch14-336 \ --mm_projector_type stc_connector \ --model_name_or_path DAMO-NLP-SG/VideoLLaMA2-7B-16F \ --data_path ${DATA_DIR}/custom_sft/custom.json \ --data_folder ${DATA_DIR}/custom_sft/ \ --pretrain_mm_mlp_adapter DAMO-NLP-SG/VideoLLaMA2-7B-16F-Base/mm_projector.bin \ --mm_vision_select_layer -2 \ --mm_use_im_start_end False \ --mm_use_im_patch_token False \ --image_aspect_ratio pad \ --num_frames 8 \ --bf16 True \ --tf32 True \ --fp16 False \ --output_dir ${OUTP_DIR}/${WANDBPROJECT}/finetune${RUN_NAME} \ --num_train_epochs 1 \ --per_device_train_batch_size $LOCAL_BATCH_SIZE \ --per_device_eval_batch_size 4 \ --gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \ --evaluation_strategy "no" \ --save_strategy "steps" \ --save_steps 500 \ --save_total_limit 99 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --model_max_length 2048 \ --gradient_checkpointing True \ --dataloader_num_workers 4 \ --report_to tensorboard \ --run_name $RUN_NAME \ `
It seems that you wish to continue fine-tuning the existing model. Please git pull origin main:main
and use such a scripts command:
#!/bin/bash
# Environment Variables
ARG_WORLD_SIZE=${1:-1}
ARG_NPROC_PER_NODE=${2:-8}
ARG_MASTER_ADDR="127.0.0.1"
ARG_MASTER_PORT=16666
ARG_RANK=0
# Multiple conditions
if [ ! -n "$WORLD_SIZE" ] || [ ! -n "$NPROC_PER_NODE" ]; then
WORLD_SIZE=$ARG_WORLD_SIZE
NPROC_PER_NODE=$ARG_NPROC_PER_NODE
fi
if [ ! -n "$MASTER_ADDR" ] || [ ! -n "$MASTER_PORT" ] || [ ! -n "$RANK" ]; then
MASTER_ADDR=$ARG_MASTER_ADDR
MASTER_PORT=$ARG_MASTER_PORT
RANK=$ARG_RANK
fi
echo "WORLD_SIZE: $WORLD_SIZE"
echo "NPROC_PER_NODE: $NPROC_PER_NODE"
# Training Arguments
GLOBAL_BATCH_SIZE=128
LOCAL_BATCH_SIZE=4
GRADIENT_ACCUMULATION_STEPS=$[$GLOBAL_BATCH_SIZE/($WORLD_SIZE*$NPROC_PER_NODE*$LOCAL_BATCH_SIZE)]
# Log Arguments
export TRANSFORMERS_OFFLINE=1
export WANDB_PROJECT=videollama2_vllava
RUN_NAME=videollama2_vllava_lora_debug
DATA_DIR=datasets
OUTP_DIR=work_dirs
torchrun --nnodes $WORLD_SIZE \
--nproc_per_node $NPROC_PER_NODE \
--master_addr=$MASTER_ADDR \
--master_port=$MASTER_PORT \
--node_rank $RANK \
videollama2/train_flash_attn.py \
--lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5 \
--deepspeed scripts/zero3.json \
--version v1_mistral \
--vision_tower openai/clip-vit-large-patch14-336 \
--mm_projector_type stc_connector \
--model_name_or_path DAMO-NLP-SG/VideoLLaMA2-7B-16F \
--data_path ${DATA_DIR}/videollava_sft/videochatgpt_llavaimage_tune.json \
--data_folder ${DATA_DIR}/videollava_sft/ \
--mm_vision_select_layer -2 \
--mm_use_im_start_end False \
--mm_use_im_patch_token False \
--image_aspect_ratio pad \
--num_frames 16 \
--bf16 True \
--tf32 True \
--fp16 False \
--output_dir ${OUTP_DIR}/${WANDB_PROJECT}/finetune_${RUN_NAME} \
--num_train_epochs 1 \
--per_device_train_batch_size $LOCAL_BATCH_SIZE \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 500 \
--save_total_limit 99 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--model_max_length 2048 \
--gradient_checkpointing True \
--dataloader_num_workers 4 \
--report_to tensorboard \
--run_name $RUN_NAME \
Thank you very much for the updated script file. It worked!
I encountered this issue while using the continue-pretrain script. Could you help me analyze how to solve it? Thank you for your assistance.
ValueError: Some specified arguments are not used by the HfArgumentParser: ['--model_name_or_path', '/data/video-llama2/VideoLLaMA2-main/VideoLLaMA2-7B-16F', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False']
I’m not sure about the reason that you cc’d this email to me.
Get Outlook for iOShttps://aka.ms/o0ukef
From: He @.> Sent: Sunday, October 13, 2024 10:00:06 AM To: DAMO-NLP-SG/VideoLLaMA2 @.> Cc: NAIHAO XU @.>; Manual @.> Subject: Re: [DAMO-NLP-SG/VideoLLaMA2] AttributeError: 'MistralConfig' object has no attribute 'attention_bias' while fine-tuning lora.sh (Issue #40)
I encountered this issue while using the continue-pretrain script. Could you help me analyze how to solve it? Thank you for your assistance. ValueError: Some specified arguments are not used by the HfArgumentParser: ['--model_name_or_path', '/data/video-llama2/VideoLLaMA2-main/VideoLLaMA2-7B-16F', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False']
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/DAMO-NLP-SG/VideoLLaMA2/issues/40*issuecomment-2409011534__;Iw!!Mak6IKo!MnMbQo26ozj8bQe42nX9Uj_nLMiZq-YiCA34agsmU4cn214MWDWI0VZ3s1S7h2804QzEUcoZ6W47Nz9EV3jeqzMC$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AXVQBQCW2K5YO6RD6YLRGJTZ3KDHNAVCNFSM6AAAAABKQP3SGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBZGAYTCNJTGQ__;!!Mak6IKo!MnMbQo26ozj8bQe42nX9Uj_nLMiZq-YiCA34agsmU4cn214MWDWI0VZ3s1S7h2804QzEUcoZ6W47Nz9EV2vLu7HU$. You are receiving this because you are subscribed to this thread.Message ID: @.***>