SafeAILab / EAGLE

Official Implementation of EAGLE
https://arxiv.org/pdf/2406.16858
Apache License 2.0
622 stars 59 forks source link

Question about past_key_value modification #82

Open baihuajun24 opened 1 week ago

baihuajun24 commented 1 week ago

Hello Eagle Team! I noticed you modified past_key_value in https://github.com/SafeAILab/EAGLE/blob/667ba930db7ea0075421f3c7df94ffbc10b93805/eagle/model/modeling_llama_kv.py#L594 by setting it to None in forward function, comparing with the source code https://github.com/huggingface/transformers/blob/e51d7ac70ab8f3e69d3659226aa838308a668238/src/transformers/models/llama/modeling_llama.py#L324 Could you provide some insights why you made such changes? I am trying to generating responses with code-llama-7b with EAGLE's KVLlamaForCausalLM class, but the results are much lower quality than results I got with default AutoModelForCausalLM class. I suspect the kv cache affects the generation.

Liyuhui-12 commented 3 days ago

This modification is due to the use of pre-allocated KV cache to optimize the efficiency of the base model (this part of the code refers to Medusa). In the cat operation at https://github.com/SafeAILab/EAGLE/blob/667ba930db7ea0075421f3c7df94ffbc10b93805/eagle/model/modeling_llama_kv.py#L591-L592 the key and value of the current token have already been cached into past_key_value, so there is no need to return the key and value of the current token for operations outside the model. This modification itself will not affect model performance, but if you do not reset the length attribute of the KV cache after a generation, it will result in abnormal generation.