Closed lihuahua123 closed 2 months ago
This might be an index out of bounds error. Could you please provide the three tensors: out_hidden, ab, and last_nopadding?
Hi, I met the same bug when running gen_ea_answer_vicuna.py. The setting is running the 7b model one single GPU with --bs 2 \ --temperature 0.
The out_hidden:
out_hidden: tensor([[[-1.6250, -0.0670, -1.0156, ..., -0.9043, 0.7173, 1.8477],
[-1.1934, -0.2004, -0.6206, ..., -1.2842, 0.6851, 0.3882],
[-0.1399, -0.0886, -2.0352, ..., 0.3835, 0.8647, 1.9785],
[-0.7075, 0.6113, -1.3457, ..., -1.6953, 1.4170, 0.6382],
[-0.0994, -0.0469, -1.9805, ..., 0.1660, 1.5156, 2.2891],
[-0.3274, 0.2817, -0.2500, ..., -0.1040, 1.2666, 0.6035]],
[[ 1.5488, -1.1260, -2.4863, ..., -1.4316, -0.9805, -0.6445],
[-0.7715, -2.8555, -1.1221, ..., -1.2061, 1.1035, -0.8579],
[ 3.0938, -1.0400, -1.3223, ..., -1.1436, -0.2920, -0.6118],
[ 0.8340, -0.1592, -0.6738, ..., -1.3975, -1.4658, -0.6670],
[-0.8066, -2.4336, 0.8086, ..., -0.5010, 0.6475, -0.2571],
[-3.2285, -0.9849, 2.6250, ..., -0.4829, 0.3352, -0.2028]]],
device='cuda:0', dtype=torch.float16) torch.Size([2, 6, 4096])
The ab:
ab: (0, 1)
The last_nopadding:
last_nopadding: tensor([0, 5], device='cuda:0') torch.Size([2])
I figure out that it is becauseinput_ids
have -1
so the inputs_embeds = self.embed_tokens(input_ids)
gets wrong which affects the position_ids and further affects last_nopadding=position_ids.argmax(dim=-1)
.
So I try to add input_ids[input_ids == -1] = 0
before self.embed_tokens(input_ids)
in model.cnets.Model.forward
function and it can run successufully.
input_ids[input_ids == -1] = 0
with torch.no_grad():
inputs_embeds = self.embed_tokens(input_ids)
Thank you very much!
Environment: cuda 11.8 python 3.8 pip install -r requirements.txt git checkout bsne1 python example.py
example.py:
got this failure: