About KVCache of Eagle vs Origin LLM

Hi thanks for this great project.

I have a question: since the EagleModel has its own KVCache or past_key_values, there must be some difference when the feature of input (feature + token_embedding) is from original LLM or EagleModel.

From this picture, we can say that

feature[make] and feature[help] in the first forward pass lead to unbiased inference because the feature[I] is from the original LLM space, so the past_key_values generated is also unbiased;
feature[with] and feature[you] in the second forward pass is biased because the input feature[help] is from Eagle space, but not the original LLM. This also applies for the past_key_values for this generation step.

Am I right about the two conclusions above ? If true, then will it be also true that as the number of generation increases (like max_new_tokens=1000), the accumulated error in past_key_values will be big due to autogressive biased input

SafeAILab / EAGLE

About KVCache of Eagle vs Origin LLM #61