EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
3.58k stars 254 forks source link

Speed up speculative decoding implementation #291

Closed EricLBuehler closed 4 months ago

EricLBuehler commented 4 months ago

As described. The speculative decoding implementation is working, but should be sped up.

EricLBuehler commented 4 months ago

Work in #296.