X-PLUG / mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Apache License 2.0
1.12k stars 68 forks source link

problem when finetuning DocOwl1.5-Omni #82

Open lmydian1014 opened 4 weeks ago

lmydian1014 commented 4 weeks ago

I have the following error when finetune the DocOwl1.5-Omni. It always raises error when index is 10. Please help!!!

File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1735, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
    return self.base_model(
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 247, in forward
    self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images, patch_positions)
  File "/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 115, in prepare_inputs_labels_for_multimodal
    cur_image_features = image_features[cur_image_idx]
IndexError: index 10 is out of bounds for dimension 0 with size 10
  9%|████████▋                                                                                            | 9/105 [01:00<10:45,  6.72s/it]
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 1144756 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1144755) of binary: /opt/conda/envs/mplug_owl2/bin/python3.10
Traceback (most recent call last):
  File "/opt/conda/envs/mplug_owl2/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/envs/mplug_owl2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
mplug_docowl/train/train_docowl.py FAILED

Below is my script.

#!/bin/bash
if [ $MASTER_ADDR ];then
    echo $MASTER_ADDR
    echo $MASTER_PORT
    echo $WORLD_SIZE
    echo $RANK
else
    MASTER_ADDR=127.0.0.1
    MASTER_PORT=2$(($RANDOM % 10))$(($RANDOM % 10))15
    WORLD_SIZE=1
    RANK=0
fi
# Change for multinode config
NNODES=${WORLD_SIZE}
NODE_RANK=${RANK}
#GPUS_PER_NODE=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
GPUS_PER_NODE=2
DISTRIBUTED_ARGS="--nproc_per_node $GPUS_PER_NODE --nnodes $NNODES --node_rank $NODE_RANK --master_addr $MASTER_ADDR --master_port $MASTER_PORT"
echo $DISTRIBUTED_ARGS

# change LOAD to your local path of DocOwl1.5-stage1
#LOAD='./mPLUG/DocOwl1.5-stage1'
LOAD='mPLUG/DocOwl1.5-Omni'

# batch size = per_device_train_batch_size x GPUS_PER_NODE x NNODES x gradient_accumulation_steps
#DATA_FILE=./DocDownstream-1.0/train.jsonl
DATA_FILE=/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/training_data.jsonl
IMAGE_FOLDER=/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/

torchrun $DISTRIBUTED_ARGS mplug_docowl/train/train_docowl.py \
    --lora_enable True --lora_r 128 --lora_alpha 256 --vision2text_lr 2e-5 \
    --deepspeed ./scripts/zero2.json \
    --model_name_or_path $LOAD \
    --version v1 \
    --data_path $DATA_FILE \
    --image_folder $IMAGE_FOLDER \
    --image_size 448 \
    --crop_anchors 'grid_9' \
    --add_global_img True \
    --add_textual_crop_indicator True \
    --bf16 True \
    --output_dir ./checkpoints/docowl1.5-lora \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 5 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 500 \
    --save_total_limit 4 \
    --learning_rate 1e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 3600 \
    --gradient_checkpointing True \
    --tune_vision2text True \
    --freeze_vision_model True \
    --freeze_backbone True \
    --dataloader_num_workers 4 \
    --lazy_preprocess True \
    --report_to tensorboard

I have tried to print the batch idx, it's weird that it always equal to 0. screenshot is provided below image_123650291

HAWLYQ commented 4 weeks ago

Hi, @lmydian1014 this error is similar with https://github.com/X-PLUG/mPLUG-DocOwl/issues/65. This may be because the number of image placeholders in the text input is not equal to the number of image features. There may be multiple input images and only 1 <|image|> in your query.

lmydian1014 commented 4 weeks ago

Thanks! Just to double check, if I have 5 images and each image has 3 corresponding questions. Previously the ./images folder contains 5 images and I prepared the .json file as follows

{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81ouNxgyqBL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the added_sugar_content of this product?"}, {"role": "assistant", "content": "9g"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81ouNxgyqBL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the added_sugar_daily_value this product?"}, {"role": "assistant", "content": "18%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81ouNxgyqBL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_content of this product?"}, {"role": "assistant", "content": "17mg"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51wyfiAQhvL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the added_sugar_daily_value this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51wyfiAQhvL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_content of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51wyfiAQhvL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_daily_value of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/71NXqwfW9cL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_daily_value of this product?"}, {"role": "assistant", "content": "0%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/71NXqwfW9cL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the carbohydrate_content of this product?"}, {"role": "assistant", "content": "17g"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/71NXqwfW9cL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the carbohydrate_daily_value of this product?"}, {"role": "assistant", "content": "6%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81TZLOCqbjL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_C_daily_value of this product?"}, {"role": "assistant", "content": "0%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81TZLOCqbjL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_Content of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81TZLOCqbjL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_daily_value of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51J22uvz4jL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_C_daily_value of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51J22uvz4jL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_Content of this product?"}, {"role": "assistant", "content": "0mcg"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51J22uvz4jL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_daily_value of this product?"}, {"role": "assistant", "content": "0%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}

What you mean is that I need to have 15 images in ./images folder and also need to change image name in ./images folder and in .json file as "81ouNxgyqBL_1.jpg", ''81ouNxgyqBL_2.jpg'', ''81ouNxgyqBL_3.jpg''?

HAWLYQ commented 3 weeks ago

Thanks! Just to double check, if I have 5 images and each image has 3 corresponding questions. Previously the ./images folder contains 5 images and I prepared the .json file as follows

{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81ouNxgyqBL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the added_sugar_content of this product?"}, {"role": "assistant", "content": "9g"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81ouNxgyqBL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the added_sugar_daily_value this product?"}, {"role": "assistant", "content": "18%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81ouNxgyqBL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_content of this product?"}, {"role": "assistant", "content": "17mg"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51wyfiAQhvL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the added_sugar_daily_value this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51wyfiAQhvL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_content of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51wyfiAQhvL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_daily_value of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/71NXqwfW9cL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the calcium_daily_value of this product?"}, {"role": "assistant", "content": "0%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/71NXqwfW9cL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the carbohydrate_content of this product?"}, {"role": "assistant", "content": "17g"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/71NXqwfW9cL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the carbohydrate_daily_value of this product?"}, {"role": "assistant", "content": "6%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81TZLOCqbjL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_C_daily_value of this product?"}, {"role": "assistant", "content": "0%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81TZLOCqbjL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_Content of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/81TZLOCqbjL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_daily_value of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51J22uvz4jL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_C_daily_value of this product?"}, {"role": "assistant", "content": "nan"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51J22uvz4jL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_Content of this product?"}, {"role": "assistant", "content": "0mcg"}], "task_name": "qa_sft", "dataset_name": "nutrition"}
{"image": ["/home/ubuntu/moyanli/mPLUG-DocOwl/DocOwl1.5/image/51J22uvz4jL.jpg"], "messages": [{"role": "user", "content": "<|image|>What is the vitamin_D_daily_value of this product?"}, {"role": "assistant", "content": "0%"}], "task_name": "qa_sft", "dataset_name": "nutrition"}

What you mean is that I need to have 15 images in ./images folder and also need to change image name in ./images folder and in .json file as "81ouNxgyqBL_1.jpg", ''81ouNxgyqBL_2.jpg'', ''81ouNxgyqBL_3.jpg''?

Hi, @lmydian1014 , this JSON file is ok. Please check whether all samples in your JSON file raise the IndexError or just partial samples. If only partial samples, could you provide some examples? (ps: the bath_idx=0 is because the per_device_train_batch_size=1 )