astramind-ai / Mixture-of-depths

Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
129 stars 7 forks source link

qwen1.5出现越界 #8

Closed FB-wh closed 4 months ago

FB-wh commented 4 months ago

错误:

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [11,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.

../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [11,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed. . . . `File "/home/.conda/envs/llama_factory_mod/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward     query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)

File "/home/.conda/envs/llama_factory_mod/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 163, in apply_rotary_pos_emb     cos = cos[position_ids].unsqueeze(unsqueeze_dim)

RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`

看起来是qwen的rope里面需要用到cos[position_ids]导致的,

def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):     cos = cos[position_ids].unsqueeze(unsqueeze_dim)     sin = sin[position_ids].unsqueeze(unsqueeze_dim)     q_embed = (q cos) + (rotate_half(q) sin)      k_embed = (k cos) + (rotate_half(k) sin)     return q_embed, k_embed

其中position_ids:

position_ids: tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]], device='cuda:0')

cos的shape:

torch.Size([47, 128])

原来长度是48但是因为跳过了一个token,所以cos第一个维度是47,position_ids没有索引12,但最大索引是47会越界,所以这个qwen的rope是不是需要修改一下才能适配mod?

FB-wh commented 4 months ago

改好了,可以训练了

shufangxun commented 4 months ago

改好了,可以训练了

请问怎么修改的

shawn0wang commented 4 months ago

我也遇到了相同问题,请问怎么弄的