Open bisunny opened 4 months ago
The base model has a layer normalization (layernorm) layer before the LM head. Since the feature sequence has already been normalized, we do not use layer normalization.
The base model has a layer normalization (layernorm) layer before the LM head. Since the feature sequence has already been normalized, we do not use layer normalization.
It is true that the base model has a layer normalization (layernorm) layer before the LM head, but this has nothing to do with EAGLE remove the input_layernorm of llama. I guess this is a trick to help improve Eagle accuracy?