Question about EAGLE-2 - Githubissues

Yipeng1994 commented 1 month ago

How to prove that EAGLE-2 ensures that the distribution of the generated text remains unchange?

Suppose you use a draft model that is identical to the target model. We have a distribution p(.|.) for the target model and q(.|.) for the draft model.

p=q.

Suppose the probability for the next token after "I" is "am" (0.51) and "have" (0.49). After top-k and reranking, we attempt to accept the first token "am" with a probability:

min(1, q("am")/p("am")) = min(1, 0.51/0.51) = 1

In this scenario, the target model has a probability of 0.49 to accept the token "have." However, when EAGLE-2 is applied, the probability of accepting "have" becomes 0, illustrating how EAGLE-2 modifies the sampling behavior of the target model.

It seems that EAGLE-2 turns the sampling strategy of the target model into the greedy sampling strategy of the draft model.

I am uncertain if I have overlooked any details in the paper that would clarify this conclusion. Could you assist me in understanding the precise mechanics of how EAGLE-2 operates to preserve the distribution while implementing top-k filtering and reranking? Your insight would be greatly appreciated in elucidating this aspect of EAGLE-2's functionality.

Liyuhui-12 commented 1 month ago

When using top-k, the actual sampling distribution is not q, and the probability of the first draft token being "am" is 1.0.

yanjunplay commented 1 month ago

@Yipeng1994 I think your question sounds like not for EAGLE-2, it's more like for the speculative sampling which was proposed in https://arxiv.org/pdf/2211.17192 EAGLE-2 and EALGE-1 adopted the same logic as mentioned in the papers. I strongly suggest to read the original speculative decoding paper https://arxiv.org/pdf/2211.17192, especially section 2.3 and appendix A.1

Yipeng1994 commented 1 month ago

When using top-k, the actual sampling distribution is not q, and the probability of the first draft token being "am" is 1.0.

I see. It appears that EAGLE-2 does not directly enhance the acceptance rate for individual tokens. Instead, it utilizes a greater number of draft tokens with a higher summary of confidence scores to improve the average acceptance length per forward pass.

One more question: for the remaining tokens, which in this case is "have", does EAGLE-2 resort to naive sampling instead of using the formula

min(1, q(x)/p(x))

any longer? @Liyuhui-12

EAGLE-2 and EALGE-1 adopted the same logic as mentioned in the papers. I strongly suggest to read the original speculative decoding paper https://arxiv.org/pdf/2211.17192, especially section 2.3 and appendix A.1

Thanks for your suggestion. I will read through it. @yanjunplay

Liyuhui-12 commented 1 month ago

You can refer to Appendix B of the EAGLE paper https://arxiv.org/abs/2401.15077 for the specific sampling algorithm. It is a recursive version of the original speculative sampling paper https://arxiv.org/pdf/2211.17192

scott306lr commented 4 weeks ago

EAGLE employs top-k sampling for verification, sampling tokens in order from left to right.

Therefore, in the multi-round speculative sampling (EAGLE Appendix B method) for loop (i ≤ k), the probability of the draft model sampling current token x[i] should always be 1 (with the probability of other tokens being 0). As a result, the acceptance rate of every token will always be r < target_prob(x[i])/1. (while updating target_prob by setting target_prob(rejected tokens) to 0 and normalize every step)

Is this correct?

SafeAILab / EAGLE

Question about EAGLE-2 #110