Closed je1lee closed 9 months ago
Could it be possible to change choices.py and make another tree architecture ??
Of course, you can refer to #6
Is there any way I could try without tree related decoding process?
A chain structure is a special type of tree, and [[0],[0,0]...] represents a chain structure.
Do you have any result about how many token is verified from draft tokens for each basemodel verification?
If you use UI inference, the 'compression ratio' box in the top right corner displays what you are looking for.
This is the experimental result on MT-bench.
Model | Compression Ratio | Model | Compression Ratio |
---|---|---|---|
Vicuna 7B | 3.94 | LLaMA2-Chat 7B | 3.62 |
Vicuna 13B | 3.98 | LLaMA2-Chat 13B | 3.90 |
Vicuna 33B | 3.68 | LLaMA2-Chat 70B | 3.80 |
Thanks for reply!! I think it works with good acceleration rate even without the tree decoding and very impressed by the fact that the single decoder could perform this far. Do you have any academical references which backgrounds your architectural choice(single decoder layer) of the draft model?? What mentioned in Blog seems a little ambiguous to me
Thank you for your interest. We are currently writing a paper of EAGLE, discussing its structure and other issues. Once the paper is completed, I will post the link here.
Thanks for reply! Then can I ask some more about the blog??
does simple at line 10 is typo for sample?? if it is, why does x is sampled from q not from p??
What does n stands for? every token composing tree? would n be 10 at the case of Figure 3 tree shape??
Thank you very much! It is a typo, x is sampled from p.
n represents the number of child nodes of the current node. For Figure 3, if the current node is I, then n=2, x1 is "may", and x2 is "help".
Could it be possible to change the choices.py and make another tree architecture ?? or is there any way I could try without tree related decoding process? and also do you have any result about how many token is verified from draft tokens for each basemodel verification?