BUG DESCRIPTION: 'NaN' occasionally occurs as below:

The bug can be reproduced: base_model: Llama-2-7b-chat-hf ea_model: EAGLE-llama2-chat-7B ea_model.py:265 import random seed = 18 random.seed(seed) torch.random.manual_seed(seed) webui input: What is deep learning

45

23

The bug occurs in step 192. The fundamental reason is the overflow of 'down_proj' in the 30th layer leading to the output of softmax 'nan' Therefore, I add two clamps after attention layers and mlp layers following the T5 in transformers.

I think the reason why the bug occurs is the inputs of the base_model are unusual, not like normal sentences a model is trained on.

SafeAILab / EAGLE

bug fix: hidden_states nan #69

45

23