Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
https://infini-ai-lab.github.io/TriForce/
144 stars 12 forks source link

Does Retrieval w/o Hierarchy test with spec decoding? #4

Closed bxyb closed 2 months ago

bxyb commented 2 months ago

I have a question on paper results. image

Does Retrieval w/o Hierarchy test with normal speculative decoding (Retrieval cache + Full cache)?

preminstrel commented 2 months ago

Hello, thanks for your interest in our research.

Yes, it means spec dec without the llama-68M model w/ StreamingLLM (Llama-7B-128K w/ retrieval cache --spec--> Llama-7B-128K w/ full cache).

So it is basically an ablation experiment that ensures hierarchical spec works.