Closed bxyb closed 2 months ago
Hello, thanks for your interest in our research.
Yes, it means spec dec without the llama-68M model w/ StreamingLLM (Llama-7B-128K w/ retrieval cache --spec--> Llama-7B-128K w/ full cache
).
So it is basically an ablation experiment that ensures hierarchical spec works.
I have a question on paper results.![image](https://github.com/Infini-AI-Lab/TriForce/assets/50622684/d69216c5-1b99-466e-b1e6-b1134b140abc)
Does Retrieval w/o Hierarchy test with normal speculative decoding (Retrieval cache + Full cache)?