Closed Monoclinic closed 3 days ago
This is because EAGLE performs non-replacement sampling during the draft stage, which requires adjusting the distribution.
Hello, thanks for your reply. I understand the non-replacement sampling requires an adjustment, but I'm confused about (top-k with original softmax prob & sampling with prob adjustment) which one is correct on math, namely being consistent with LLM. Or both are correct? BTW after prob adjustment, the probs of draft model is higher than top-k prob, which would bring a slightly higher rejection rate?
Hello, what we need is the actual distribution from which each draft token is sampled, so adjustments are necessary.
Hello, thanks for your reply. I've just read EAGLE-2 and I found the speedup rate could be over 4x. However in the paper the maximum depth of the tree in expand stage is 4, therefore the theoretical upperbound of speedup rate is 4x, is there anything wrong?
The depth of the draft tree is 6 (see the appendix). Are you referring to the number 4 based on Figure 7 in our paper? It is a schematic diagram and does not represent the actual tree size.
Got that. Thank you!
Hello, I read the code and have questions here: https://github.com/SafeAILab/EAGLE/blob/cbc73dcb88bb7541c8f9a0f11f2468ec68c523b6/model/cnets.py#L718 It seems that the probs of sampled tokens from draft model are not original softmax prob, instead, it looks like a conditional prob (the prob of top2 is calculated without top1, and similar for top3, top4...) While the prob of base model is calculated directly by softmax: https://github.com/SafeAILab/EAGLE/blob/cbc73dcb88bb7541c8f9a0f11f2468ec68c523b6/model/utils.py#L376 I wonder why the probs of draft model is processed by a cumsum. Will it bring some misalignment between the probs of draft and base model?