Closed diaoyingyu closed 4 months ago
Hello, thanks for your interest in our work!
Yes, you can. In our provided implementation, we set $\gamma_1 = 1$ because we observed that the performance is nearly the same for $\gamma_1 = 2$, and it decreases for larger values of $\gamma_1$. This is due to the low acceptance rate for Llama-68M. To keep things simple, our open-source code uses $\gamma_1 = 1$. If you’d like to try using better draft models with higher acceptance rates, you can directly modify the function linked below. You only need to add an extra inner loop for $\gamma_1$.
As we expected, since the draft model is quite small (68M) and limited local information (StreamingLLM), the acceptance rate is low. It is only about 0.35.
If you have any further questions, feel free to ask.
Got it! Thanks for your reply :)
Hi, Thanks for the great work! I'm trying to understand the triforce method, but confused about the middle speculation.
Thanks