Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
https://infini-ai-lab.github.io/TriForce/
230 stars 12 forks source link

how to change y1? #6

Open Lucas-TY opened 5 months ago

Lucas-TY commented 5 months ago

It seems like gamma is y2, but how do you change y1?

preminstrel commented 5 months ago

Hello,

Thanks for your interest in our work! In our provided implementation, we set $\gamma_1 = 1$ because we observed that the performance is nearly the same for $\gamma_1 = 2$, and it decreases for larger values of $\gamma_1$. This is due to the low acceptance rate for Llama-68M. To keep things simple, our open-source code uses $\gamma_1 = 1$.

If you’d like to try using better draft models with higher acceptance rates, you can directly modify the function linked below. You only need to add an extra inner loop for $\gamma_1$:

https://github.com/Infini-AI-Lab/TriForce/blob/e865a1df7ded2b43bc309106c05371c429fc10f1/utils/decoding.py#L182-L222

If you have any further questions, feel free to ask.