PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models
https://arxiv.org/abs/2401.15947
Apache License 2.0
1.9k stars 121 forks source link

Error during training on custom dataset #48

Open saeedkhaki92 opened 6 months ago

saeedkhaki92 commented 6 months ago

Describe the issue

Hello,

I am training llava-mistral on custom dataset, but somewhere during training, I encounter the following error:

  train()
  File "/home/ubuntu/scripts/MoE-LLaVA/moellava/train/train.py", line 1465, in train
    trainer.train()
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/transformers/trainer.py", line 1854, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/transformers/trainer.py", line 2735, in training_step
    loss = self.compute_loss(model, inputs)
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/transformers/trainer.py", line 2758, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1735, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/envs/moellava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/ubuntu/scripts/MoE-LLaVA/moellava/model/language_model/llava_mistral.py", line 68, in forward
    ) = self.prepare_inputs_labels_for_multimodal(
  File "/home/ubuntu/scripts/MoE-LLaVA/moellava/model/llava_arch.py", line 302, in prepare_inputs_labels_for_multimodal
    cur_image_features = image_features[cur_image_idx].to(self.device)
IndexError: list index out of range

So, I was wondering if anyone can help? Thanks

LinB203 commented 6 months ago

This problem most likely occurs because the dataset format does not follow the LLaVA format. You can check if your json annotation is consistent with LLaVA.