I have a question: since the EagleModel has its own KVCache or past_key_values, there must be some difference when the feature of input (feature + token_embedding) is from original LLM or EagleModel.
From this picture, we can say that
feature[make] and feature[help] in the first forward pass lead to unbiased inference because the feature[I] is from the original LLM space, so the past_key_values generated is also unbiased;
feature[with] and feature[you] in the second forward pass is biased because the input feature[help] is from Eagle space, but not the original LLM. This also applies for the past_key_values for this generation step.
Am I right about the two conclusions above ? If true, then will it be also true that as the number of generation increases (like max_new_tokens=1000), the accumulated error in past_key_values will be big due to autogressive biased input
Hi thanks for this great project.
I have a question: since the
EagleModel
has its ownKVCache
orpast_key_values
, there must be some difference when thefeature
of input (feature
+token_embedding
) is from original LLM orEagleModel
.From this picture, we can say that
feature[make]
andfeature[help]
in the first forward pass lead tounbiased
inference because thefeature[I]
is from the original LLM space, so thepast_key_values
generated is also unbiased;feature[with]
andfeature[you]
in the second forward pass isbiased
because the inputfeature[help]
is fromEagle
space, but not the original LLM. This also applies for thepast_key_values
for this generation step.Am I right about the two conclusions above ? If true, then will it be also true that as the number of generation increases (like
max_new_tokens=1000
), the accumulated error inpast_key_values
will be big due to autogressive biased input