../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [11,0,0], thread: [64,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [11,0,0], thread: [65,0,0] Assertion -sizes[i] <= index && index < sizes[i] && "index out of bounds" failed.
.
.
.
`File "/home/.conda/envs/llama_factory_mod/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
File "/home/.conda/envs/llama_factory_mod/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 163, in apply_rotary_pos_emb
cos = cos[position_ids].unsqueeze(unsqueeze_dim)
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`
看起来是qwen的rope里面需要用到cos[position_ids]导致的,
def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1):
cos = cos[position_ids].unsqueeze(unsqueeze_dim)
sin = sin[position_ids].unsqueeze(unsqueeze_dim)
q_embed = (q cos) + (rotate_half(q) sin)
k_embed = (k cos) + (rotate_half(k) sin)
return q_embed, k_embed
错误:
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [11,0,0], thread: [64,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [11,0,0], thread: [65,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed. . . . `File "/home/.conda/envs/llama_factory_mod/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 677, in forward query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)File "/home/.conda/envs/llama_factory_mod/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 163, in apply_rotary_pos_emb cos = cos[position_ids].unsqueeze(unsqueeze_dim)
RuntimeError: CUDA error: device-side assert triggered Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.`看起来是qwen的rope里面需要用到cos[position_ids]导致的,
def apply_rotary_pos_emb(q, k, cos, sin, position_ids, unsqueeze_dim=1): cos = cos[position_ids].unsqueeze(unsqueeze_dim) sin = sin[position_ids].unsqueeze(unsqueeze_dim) q_embed = (q cos) + (rotate_half(q) sin) k_embed = (k cos) + (rotate_half(k) sin) return q_embed, k_embed
其中position_ids:
position_ids: tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]], device='cuda:0')
cos的shape:
torch.Size([47, 128])
原来长度是48但是因为跳过了一个token,所以cos第一个维度是47,position_ids没有索引12,但最大索引是47会越界,所以这个qwen的rope是不是需要修改一下才能适配mod?