FineTuning Error - Githubissues

I have follow this training

LLaMA-Factory Here we provide a script for supervised finetuning Qwen2-VL with LLaMA-Factory https://github.com/hiyouga/LLaMA-Factory. This script for supervised finetuning (SFT) has the following features:

Support multi-images input;

Support single-GPU and multi-GPU training;

Support full-parameter tuning, LoRA.

In the following, we introduce more details about the usage of the script.

Installation Before you start, make sure you have installed the following packages:

Follow the instructions of LLaMA-Factory https://github.com/hiyouga/LLaMA-Factory, and build the environment. Install these packages (Optional): pip install deepspeed pip install flash-attn --no-build-isolation If you want to use FlashAttention-2 https://github.com/Dao-AILab/flash-attention, make sure your CUDA is 11.6 and above. Data Preparation LLaMA-Factory provides several training datasets in data folder, you can use it directly. If you are using a custom dataset, please prepare your dataset as follows.

Organize your data in a json file and put your data in data folder. LLaMA-Factory supports multimodal dataset in sharegpt format. The dataset in sharegpt format should follow the below format: [ { "messages": [ { "content": "Who are they?", "role": "user" }, { "content": "They're Kane and Gretzka from Bayern Munich.", "role": "assistant" }, { "content": "What are they doing?", "role": "user" }, { "content": "They are celebrating on the soccer field.", "role": "assistant" } ], "images": [ "mllm_demo_data/1.jpg", "mllm_demo_data/1.jpg" ] }, ] Provide your dataset definition in data/dataset_info.json in the following format . For sharegpt format dataset, the columns in dataset_info.json should be: "dataset_name": { "file_name": "dataset_name.json", "formatting": "sharegpt", "columns": { "messages": "messages", "images": "images" }, "tags": { "role_tag": "role", "content_tag": "content", "user_tag": "user", "assistant_tag": "assistant" } } Training Lora SFT examples:

llamafactory-cli train examples/train_lora/qwen2vl_lora_sft.yaml llamafactory-cli export examples/merge_lora/qwen2vl_lora_sft.yaml Full SFT examples:

llamafactory-cli train examples/train_full/qwen2vl_full_sft.yaml Inference examples:

llamafactory-cli webchat examples/inference/qwen2_vl.yaml llamafactory-cli api examples/inference/qwen2_vl.yaml Execute the following training command:

DISTRIBUTED_ARGS=" --nproc_per_node $NPROC_PER_NODE \ --nnodes $NNODES \ --node_rank $NODE_RANK \ --master_addr $MASTER_ADDR \ --master_port $MASTER_PORT "

torchrun $DISTRIBUTED_ARGS src/train.py \ --deepspeed $DS_CONFIG_PATH \ --stage sft \ --do_train \ --model_name_or_path Qwen/Qwen2-VL-7B-Instruct \ --dataset mllm_demo \ --template qwen2_vl \ --finetuning_type lora \ --output_dir $OUTPUT_PATH \ --overwrite_cache \ --overwrite_output_dir \ --warmup_steps 100 \ --weight_decay 0.1 \ --per_device_train_batch_size 2 \ --gradient_accumulation_steps 4 \ --ddp_timeout 9000 \ --learning_rate 5e-6 \ --lr_scheduler_type cosine \ --logging_steps 1 \ --cutoff_len 4096 \ --save_steps 1000 \ --plot_loss \ --num_train_epochs 3 \ --bf16 and enjoy the training process. To make changes to your training, you can modify the arguments in the training command to adjust the hyperparameters. One argument to note is cutoff_len, which is the maximum length of the training data. Control this parameter to avoid OOM error.

but fail wih Error

(llma_factory) (base) gitlab@AIMACHINE:~/training_qwen2_vl$ ./run_training.sh [2024-09-01 21:05:57,109] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-09-01 21:06:03,328] [INFO] [comm.py:652:init_distributed] cdb=None [2024-09-01 21:06:03,328] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl 09/01/2024 21:06:03 - WARNING - llamafactory.hparams.parser - ddp_find_unused_parameters needs to be set as False for LoRA in DDP training. 09/01/2024 21:06:03 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:03,769 >> loading file vocab.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/vocab.json [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:03,769 >> loading file merges.txt from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/merges.txt [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:03,769 >> loading file tokenizer.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/tokenizer.json [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:03,769 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:03,770 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:03,770 >> loading file tokenizer_config.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/tokenizer_config.json [INFO|tokenization_utils_base.py:2426] 2024-09-01 21:06:04,206 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|image_processing_base.py:375] 2024-09-01 21:06:04,997 >> loading configuration file preprocessor_config.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/preprocessor_config.json [INFO|image_processing_base.py:375] 2024-09-01 21:06:05,265 >> loading configuration file preprocessor_config.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/preprocessor_config.json [INFO|image_processing_base.py:429] 2024-09-01 21:06:05,266 >> Image processor Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "max_pixels": 12845056, "min_pixels": 3136 }, "temporal_patch_size": 2 }

[INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:05,526 >> loading file vocab.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/vocab.json [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:05,526 >> loading file merges.txt from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/merges.txt [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:05,526 >> loading file tokenizer.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/tokenizer.json [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:05,526 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:05,526 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2182] 2024-09-01 21:06:05,526 >> loading file tokenizer_config.json from cache at /home/gitlab/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/f8732c745fc73669416c48c046489fbbfa70ea2f/tokenizer_config.json [INFO|tokenization_utils_base.py:2426] 2024-09-01 21:06:05,941 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|processing_utils.py:722] 2024-09-01 21:06:07,019 >> Processor Qwen2VLProcessor:

image_processor: Qwen2VLImageProcessor { "do_convert_rgb": true, "do_normalize": true, "do_rescale": true, "do_resize": true, "image_mean": [ 0.48145466, 0.4578275, 0.40821073 ], "image_processor_type": "Qwen2VLImageProcessor", "image_std": [ 0.26862954, 0.26130258, 0.27577711 ], "max_pixels": 12845056, "merge_size": 2, "min_pixels": 3136, "patch_size": 14, "processor_class": "Qwen2VLProcessor", "resample": 3, "rescale_factor": 0.00392156862745098, "size": { "max_pixels": 12845056, "min_pixels": 3136 }, "temporal_patch_size": 2 }
tokenizer: Qwen2TokenizerFast(name_or_path='Qwen/Qwen2-VL-2B-Instruct', vocab_size=151643, model_max_length=32768, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'eos_token': '<|im_end|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>', '<|object_ref_start|>', '<|object_ref_end|>', '<|box_start|>', '<|box_end|>', '<|quad_start|>', '<|quad_end|>', '<|vision_start|>', '<|vision_end|>', '<|vision_pad|>', '<|image_pad|>', '<|video_pad|>']}, clean_up_tokenization_spaces=False), added_tokens_decoder={ 151643: AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151644: AddedToken("<|im_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151645: AddedToken("<|im_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151646: AddedToken("<|object_ref_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151647: AddedToken("<|object_ref_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151648: AddedToken("<|box_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151649: AddedToken("<|box_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151650: AddedToken("<|quad_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151651: AddedToken("<|quad_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151652: AddedToken("<|vision_start|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151653: AddedToken("<|vision_end|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151654: AddedToken("<|vision_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151655: AddedToken("<|image_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), 151656: AddedToken("<|video_pad|>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True), }

{ "chat_template": "{% set image_count = namespace(value=0) %}{% set video_count = namespace(value=0) %}{% for message in messages %}{% if loop.first and message['role'] != 'system' %}<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n{% endif %}<|im_start|>{{ message['role'] }}\n{% if message['content'] is string %}{{ message['content'] }}<|im_end|>\n{% else %}{% for content in message['content'] %}{% if content['type'] == 'image' or 'image' in content or 'image_url' in content %}{% set image_count.value = image_count.value + 1 %}{% if add_vision_id %}Picture {{ image_count.value }}: {% endif %}<|vision_start|><|image_pad|><|vision_end|>{% elif content['type'] == 'video' or 'video' in content %}{% set video_count.value = video_count.value + 1 %}{% if add_vision_id %}Video {{ video_count.value }}: {% endif %}<|vision_start|><|video_pad|><|vision_end|>{% elif 'text' in content %}{{ content['text'] }}{% endif %}{% endfor %}<|im_end|>\n{% endif %}{% endfor %}{% if add_generation_prompt %}<|im_start|>assistant\n{% endif %}", "processor_class": "Qwen2VLProcessor" }

09/01/2024 21:06:07 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 09/01/2024 21:06:07 - INFO - llamafactory.data.loader - Loading dataset cccd_data.json... Generating train split: 2 examples [00:00, 71.87 examples/s] Converting format of dataset: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 127.88 examples/s] Running tokenizer on dataset: 0%| | 0/2 00:00<?, ? examples/s: Traceback (most recent call last): rank0: File "/home/gitlab/training_qwen2_vl/src/train.py", line 28, in

rank0: File "/home/gitlab/training_qwen2_vl/src/train.py", line 19, in main

rank0: File "/home/gitlab/training_qwen2_vl/src/llamafactory/train/tuner.py", line 50, in run_exp rank0: run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks) rank0: File "/home/gitlab/training_qwen2_vl/src/llamafactory/train/sft/workflow.py", line 46, in run_sft rank0: dataset_module = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)

rank0: File "/home/gitlab/training_qwen2_vl/src/llamafactory/data/loader.py", line 237, in get_dataset rank0: dataset = _get_preprocessed_dataset(

rank0: File "/home/gitlab/training_qwen2_vl/src/llamafactory/data/loader.py", line 183, in _get_preprocessed_dataset rank0: dataset = dataset.map(preprocess_func, batched=True, remove_columns=column_names, **kwargs)

rank0: File "/home/gitlab/miniconda3/envs/llma_factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 602, in wrapper rank0: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)

rank0: File "/home/gitlab/miniconda3/envs/llma_factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 567, in wrapper rank0: out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)

rank0: File "/home/gitlab/miniconda3/envs/llma_factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3167, in map rank0: for rank, done, content in Dataset._map_single(**dataset_kwargs): rank0: File "/home/gitlab/miniconda3/envs/llma_factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3558, in _map_single rank0: batch = apply_function_on_filtered_inputs(

rank0: File "/home/gitlab/miniconda3/envs/llma_factory/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3427, in apply_function_on_filtered_inputs rank0: processed_inputs = function(fn_args, additional_args, **fn_kwargs)

rank0: File "/home/gitlab/training_qwen2_vl/src/llamafactory/data/processors/supervised.py", line 104, in preprocess_supervised_dataset rank0: prompt = template.mm_plugin.process_messages(examples["prompt"][i], examples["images"][i], processor)

rank0: File "/home/gitlab/training_qwen2_vl/src/llamafactory/data/mm_plugin.py", line 234, in process_messages rank0: self.image_token * (image_grid_thw[index].prod() // merge_length)

rank0: IndexError: index 2 is out of bounds for dimension 0 with size 2 How I fix this error? my dataset here: [ { "messages": [ { "content": "Please provide the name on this .", "role": "user" }, { "content": "The name is test1", "role": "assistant" }, { "content": "What is the date of birth shown on this ?", "role": "user" }, { "content": "The date of birth is 04-08-1985.", "role": "assistant" }, { "content": "What is the ID number on this ?", "role": "user" }, { "content": "The ID number is 1111.", "role": "assistant" }, { "content": "What is the sex (gender) shown on this ?", "role": "user" }, { "content": "The sex is nữ.", "role": "assistant" }, { "content": "What is the nationality on this ?", "role": "user" }, { "content": "The nationality is US.", "role": "assistant" }, { "content": "What is the home address on this ?", "role": "user" }, { "content": "The home address is USA.", "role": "assistant" }, { "content": "What is the date of issue of this ?", "role": "user" }, { "content": "The date of issue is 11-07-2022.", "role": "assistant" } ], "images": [ "data/images/051185009300_Front.jpg", "data/images/051185009300_Back.jpg" ] } ]

QwenLM / Qwen2-VL

FineTuning Error #60