SafeAILab / EAGLE

Official Implementation of EAGLE
https://arxiv.org/pdf/2406.16858
Apache License 2.0
622 stars 59 forks source link

bug fix: hidden_states nan #69

Closed dtlzhuangz closed 1 month ago

dtlzhuangz commented 1 month ago

BUG DESCRIPTION: 'NaN' occasionally occurs as below: image

The bug can be reproduced: base_model: Llama-2-7b-chat-hf ea_model: EAGLE-llama2-chat-7B ea_model.py:265 import random seed = 18 random.seed(seed) torch.random.manual_seed(seed) webui input: What is deep learning

45

23

The bug occurs in step 192. The fundamental reason is the overflow of 'down_proj' in the 30th layer leading to the output of softmax 'nan' image image Therefore, I add two clamps after attention layers and mlp layers following the T5 in transformers.

I think the reason why the bug occurs is the inputs of the base_model are unusual, not like normal sentences a model is trained on.

Liyuhui-12 commented 1 month ago

Thanks!