Closed Hevans123 closed 2 months ago
I want to know if the function [initialize_past_key_values()] must be used.
It's not mandatory. The pre-allocated kv_cache is not used in modeling_eagle.py.
If I do not pre-allocate kv_cache, will the acceleration effects be worse?
The speedup ratio will not decrease because the pre-allocated kv_cache is used in the target model, which makes both the baseline (vanilla autoregressive) and EAGLE faster. The absolute value of the generation speed will decrease.
Thanks for your great repo. I want to know if the function [initialize_past_key_values()] must be used. If I do not pre-allocate kv_cache, will the acceleration effects be worse?