Closed Yipeng1994 closed 1 month ago
When using top-k, the actual sampling distribution is not q, and the probability of the first draft token being "am" is 1.0.
@Yipeng1994 I think your question sounds like not for EAGLE-2, it's more like for the speculative sampling which was proposed in https://arxiv.org/pdf/2211.17192 EAGLE-2 and EALGE-1 adopted the same logic as mentioned in the papers. I strongly suggest to read the original speculative decoding paper https://arxiv.org/pdf/2211.17192, especially section 2.3 and appendix A.1
When using top-k, the actual sampling distribution is not q, and the probability of the first draft token being "am" is 1.0.
I see. It appears that EAGLE-2 does not directly enhance the acceptance rate for individual tokens. Instead, it utilizes a greater number of draft tokens with a higher summary of confidence scores to improve the average acceptance length per forward pass.
One more question: for the remaining tokens, which in this case is "have", does EAGLE-2 resort to naive sampling instead of using the formula
min(1, q(x)/p(x))
any longer? @Liyuhui-12
EAGLE-2 and EALGE-1 adopted the same logic as mentioned in the papers. I strongly suggest to read the original speculative decoding paper https://arxiv.org/pdf/2211.17192, especially section 2.3 and appendix A.1
Thanks for your suggestion. I will read through it. @yanjunplay
You can refer to Appendix B of the EAGLE paper https://arxiv.org/abs/2401.15077 for the specific sampling algorithm. It is a recursive version of the original speculative sampling paper https://arxiv.org/pdf/2211.17192
EAGLE employs top-k sampling for verification, sampling tokens in order from left to right.
Therefore, in the multi-round speculative sampling (EAGLE Appendix B method) for loop (i ≤ k), the probability of the draft model sampling current token x[i] should always be 1 (with the probability of other tokens being 0). As a result, the acceptance rate of every token will always be r < target_prob(x[i])/1. (while updating target_prob by setting target_prob(rejected tokens) to 0 and normalize every step)
Is this correct?
How to prove that EAGLE-2 ensures that the distribution of the generated text remains unchange?
Suppose you use a draft model that is identical to the target model. We have a distribution p(.|.) for the target model and q(.|.) for the draft model.
Suppose the probability for the next token after "I" is "am" (0.51) and "have" (0.49). After top-k and reranking, we attempt to accept the first token "am" with a probability:
In this scenario, the target model has a probability of 0.49 to accept the token "have." However, when EAGLE-2 is applied, the probability of accepting "have" becomes 0, illustrating how EAGLE-2 modifies the sampling behavior of the target model.
It seems that EAGLE-2 turns the sampling strategy of the target model into the greedy sampling strategy of the draft model.
I am uncertain if I have overlooked any details in the paper that would clarify this conclusion. Could you assist me in understanding the precise mechanics of how EAGLE-2 operates to preserve the distribution while implementing top-k filtering and reranking? Your insight would be greatly appreciated in elucidating this aspect of EAGLE-2's functionality.