SafeAILab / EAGLE

Official Implementation of EAGLE
https://arxiv.org/pdf/2406.16858
Apache License 2.0
622 stars 59 forks source link

bsne1 branch "last_hidden = out_hidden[ab,last_nopadding][:,None]" gets wrong #54

Closed lihuahua123 closed 2 months ago

lihuahua123 commented 3 months ago

Environment: cuda 11.8 python 3.8 pip install -r requirements.txt git checkout bsne1 python example.py

example.py:

from model.ea_model import EaModel
from fastchat.model import get_conversation_template
import torch
model = EaModel.from_pretrained(
    base_model_path="/home/server/models/llamla-2-7b-chat",
    ea_model_path="/home/server/models/EAGLE-LLAMA2-CHAT-7b",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
    device_map="auto"
)
# left padding
model.eval()
model.tokenizer.padding_side = "left"
model.tokenizer.pad_token = model.tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id

sys_p = "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."

your_message="Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions."
conv = get_conversation_template("llama-2-chat")
conv.system_message = sys_p
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt1 = conv.get_prompt()+" "

your_message="Hello"
conv = get_conversation_template("llama-2-chat")
conv.system_message = sys_p
conv.append_message(conv.roles[0], your_message)
conv.append_message(conv.roles[1], None)
prompt2 = conv.get_prompt()+" "

input_s=model.tokenizer([prompt1,prompt2],return_tensors="pt",padding=True).to("cuda")
output_ids=model.eagenerate(input_s.input_ids,input_s.attention_mask,temperature=0.0,max_new_tokens=512,top_k=15)
output=model.tokenizer.batch_decode(output_ids)
print(output)

got this failure:

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [31,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [31,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [31,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [31,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
.....
....
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [2,0,0], thread: [126,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [2,0,0], thread: [127,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "example.py", line 34, in <module>
    output_ids=model.eagenerate(input_s.input_ids,input_s.attention_mask,temperature=0.0,max_new_tokens=512,top_k=15)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/server/EAGLE-bsne1/model/ea_model.py", line 242, in eagenerate
    input_ids, tree_logits, new_token, hidden_state, sample_token,attention_mask,newfinish_flag,new_outs = update_inference_inputs(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/server/EAGLE-bsne1/model/utils.py", line 514, in update_inference_inputs
    tree_logits = model.ea_layer.topK_genrate(draft_hidden,
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/server/EAGLE-bsne1/model/cnets.py", line 772, in topK_genrate
    last_hidden = out_hidden[ab,last_nopadding][:,None]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Liyuhui-12 commented 3 months ago

This might be an index out of bounds error. Could you please provide the three tensors: out_hidden, ab, and last_nopadding?

FelixMessi commented 3 months ago

Hi, I met the same bug when running gen_ea_answer_vicuna.py. The setting is running the 7b model one single GPU with --bs 2 \ --temperature 0.


The out_hidden:
out_hidden:  tensor([[[-1.6250, -0.0670, -1.0156,  ..., -0.9043,  0.7173,  1.8477],
         [-1.1934, -0.2004, -0.6206,  ..., -1.2842,  0.6851,  0.3882],
         [-0.1399, -0.0886, -2.0352,  ...,  0.3835,  0.8647,  1.9785],
         [-0.7075,  0.6113, -1.3457,  ..., -1.6953,  1.4170,  0.6382],
         [-0.0994, -0.0469, -1.9805,  ...,  0.1660,  1.5156,  2.2891],
         [-0.3274,  0.2817, -0.2500,  ..., -0.1040,  1.2666,  0.6035]],

        [[ 1.5488, -1.1260, -2.4863,  ..., -1.4316, -0.9805, -0.6445],
         [-0.7715, -2.8555, -1.1221,  ..., -1.2061,  1.1035, -0.8579],
         [ 3.0938, -1.0400, -1.3223,  ..., -1.1436, -0.2920, -0.6118],
         [ 0.8340, -0.1592, -0.6738,  ..., -1.3975, -1.4658, -0.6670],
         [-0.8066, -2.4336,  0.8086,  ..., -0.5010,  0.6475, -0.2571],
         [-3.2285, -0.9849,  2.6250,  ..., -0.4829,  0.3352, -0.2028]]],
       device='cuda:0', dtype=torch.float16) torch.Size([2, 6, 4096])
The ab:
ab:  (0, 1)
The last_nopadding:
last_nopadding:  tensor([0, 5], device='cuda:0') torch.Size([2])
lihuahua123 commented 2 months ago

I figure out that it is becauseinput_ids have -1 so the inputs_embeds = self.embed_tokens(input_ids) gets wrong which affects the position_ids and further affects last_nopadding=position_ids.argmax(dim=-1).

So I try to add input_ids[input_ids == -1] = 0 before self.embed_tokens(input_ids) in model.cnets.Model.forward function and it can run successufully.

input_ids[input_ids == -1] = 0
with torch.no_grad():
    inputs_embeds = self.embed_tokens(input_ids)
Liyuhui-12 commented 2 months ago

Thank you very much!