Error occurred while training DocOwl1.5 on my dataset

whalefa1I commented 4 months ago

{'loss': 1.5804, 'learning_rate': 2.1301775147929e-06, 'epoch': 0.01}                                                                                                                                                
{'loss': 1.377, 'learning_rate': 2.2485207100591717e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 1.308, 'learning_rate': 2.366863905325444e-06, 'epoch': 0.01}                                                                                                                                               
{'loss': 1.2537, 'learning_rate': 2.485207100591716e-06, 'epoch': 0.01}                                                                                                                                              
{'loss': 1.174, 'learning_rate': 2.603550295857988e-06, 'epoch': 0.01}                                                                                                                                               
  0%|▋                                                                                                                                                                          | 22/5625 [03:56<16:37:04, 10.68s/it]Traceback (most recent call last):
  File "mplug_docowl/train/train_docowl.py", line 731, in <module>
    train()
  File "mplug_docowl/train/train_docowl.py", line 708, in train
    trainer.train()
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2654, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/lib/python3.8/site-packages/transformers/trainer.py", line 2679, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1852, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/mnt/workspace/sunzheng/LVLM/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 243, in forward
    self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images, patch_positions)
  File "/mnt/workspace/sunzheng/LVLM/mPLUG-DocOwl/DocOwl1.5/mplug_docowl/model/modeling_mplug_docowl.py", line 111, in prepare_inputs_labels_for_multimodal
    cur_image_features = image_features[cur_image_idx]
IndexError: index 9 is out of bounds for dimension 0 with size 9

Have you encountered this issue before?

HAWLYQ commented 4 months ago

Hi, @whalefa1I, it seems that the format of some samples of your data may be incorrect, could you print the sample causing this error?

whalefa1I commented 4 months ago

Hi, @whalefa1I, it seems that the format of some samples of your data may be incorrect, could you print the sample causing this error?

The text content is generated by coding works, so there shouldn't be too many errors,but I'll go check it now. Is it possible that the issue is due to the images?

HAWLYQ commented 4 months ago

Hi, @whalefa1I, it seems that the format of some samples of your data may be incorrect, could you print the sample causing this error?

The text content is generated by coding works, so there shouldn't be too many errors,but I'll go check it now. Is it possible that the issue is due to the images?

Yes, this error is due to the number of image placeholders in the text input is not equal to the number of image features. Maybe there are multiple input images and only 1 <|image|> in your query?

whalefa1I commented 4 months ago

Hi, @whalefa1I, it seems that the format of some samples of your data may be incorrect, could you print the sample causing this error?

The text content is generated by coding works, so there shouldn't be too many errors,but I'll go check it now. Is it possible that the issue is due to the images?

Yes, this error is due to the number of image placeholders in the text input is not equal to the number of image features. Maybe there are multiple input images and only 1 <|image|> in your query?

I will add some assertions for verification. Thank you.

X-PLUG / mPLUG-DocOwl

Error occurred while training DocOwl1.5 on my dataset #65