RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

LianghuiGuo commented 7 months ago

mplug-owl2，finetune遇到这个问题，环境配置按照官方来的，数据用的32个测试数据，仿照LLAVA的数据构建

Loading checkpoint shards: 100%|██████████| 33/33 [02:49<00:00,  5.37s/it]
Loading checkpoint shards: 100%|██████████| 33/33 [02:49<00:00,  5.14s/it]
Some weights of MPLUGOwl2LlamaForCausalLM were not initialized from the model checkpoint at /data/oss_bucket_0/mplug_owl2 and are newly initialized: ['model.visual_abstractor.encoder.layers.0.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.0.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.3.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.3.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.1.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.2.crossattention.attention.q_pos_embed', 'model.visual_abstractor.encoder.layers.5.crossattention.attention.k_pos_embed', 'model.visual_abstractor.encoder.layers.4.crossattention.attention.k_pos_embed']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
ic| training_args.tune_visual_abstractor: True
ic| training_args.freeze_vision_model: True
ic| len(optimizer_grouped_parameters[0]['params']): 1040
    len(optimizer_grouped_parameters[1]['params']): 91
Using :/usr/local/ninja as PyTorch extensions root...
Loading extension module utils...
Using :/usr/local/ninja as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...

  0%|          | 0/1 [00:00<?, ?it/s]/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
  File "/checkpoint/binary/train_package/mplug_owl2/train/train_mem.py", line 13, in <module>
    train()
  File "/checkpoint/binary/train_package/mplug_owl2/train/train.py", line 801, in train
    trainer.train()
  File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
    return inner_training_loop(
  File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2654, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/.local/lib/python3.8/site-packages/transformers/trainer.py", line 2679, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.local/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/root/.local/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1736, in forward
    loss = self.module(*inputs, **kwargs)
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/root/.local/lib/python3.8/site-packages/peft/peft_model.py", line 922, in forward
    return self.base_model(
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_mplug_owl2.py", line 242, in forward
    outputs = self.model(
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_llama2.py", line 337, in model_forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 249, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 107, in forward
    outputs = run_function(*args)
  File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_llama2.py", line 333, in custom_forward
    return module(*inputs, past_key_value, output_attentions)
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/checkpoint/binary/train_package/mplug_owl2/model/modeling_llama2.py", line 222, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/opt/conda/envs/python3.8.13/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/checkpoint/binary/train_package/mplug_owl2/train/llama_flash_attn_monkey_patch.py", line 55, in forward
    query_states, key_states = apply_rotary_pos_emb(
  File "/root/.local/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 184, in apply_rotary_pos_emb
    cos = cos[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

sunzhe09 commented 7 months ago

I met the same problem

LianghuiGuo commented 7 months ago

手动改了一下device，可以跑通。

def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
    # print(q.device, k.device, cos.device, sin.device, position_ids.device)
    # cuda:0 cuda:0 cpu cpu cuda:0
    # The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
    cos = cos.squeeze(1).squeeze(0)  # [seq_len, dim]
    sin = sin.squeeze(1).squeeze(0)  # [seq_len, dim]
    #将cos和sin的device对齐到position_ids
    cos = cos.to(position_ids.device)
    sin = sin.to(position_ids.device)
    cos = cos[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
    sin = sin[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

X-PLUG / mPLUG-Owl

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #190