BUG DESCRIPTION:
'NaN' occasionally occurs as below:
The bug can be reproduced:
base_model: Llama-2-7b-chat-hf
ea_model: EAGLE-llama2-chat-7B
ea_model.py:265
import randomseed = 18random.seed(seed)torch.random.manual_seed(seed)
webui input: What is deep learning
45
23
The bug occurs in step 192. The fundamental reason is the overflow of 'down_proj' in the 30th layer leading to the output of softmax 'nan'
Therefore, I add two clamps after attention layers and mlp layers following the T5 in transformers.
I think the reason why the bug occurs is the inputs of the base_model are unusual, not like normal sentences a model is trained on.
BUG DESCRIPTION: 'NaN' occasionally occurs as below:![image](https://github.com/SafeAILab/EAGLE/assets/139844877/08832999-f404-4b78-8007-57c2fd03d01d)
The bug can be reproduced: base_model: Llama-2-7b-chat-hf ea_model: EAGLE-llama2-chat-7B ea_model.py:265
import random
seed = 18
random.seed(seed)
torch.random.manual_seed(seed)
webui input: What is deep learning45
23
The bug occurs in step 192. The fundamental reason is the overflow of 'down_proj' in the 30th layer leading to the output of softmax 'nan'
Therefore, I add two clamps after attention layers and mlp layers following the T5 in transformers.
I think the reason why the bug occurs is the inputs of the base_model are unusual, not like normal sentences a model is trained on.