Speed up speculative decoding implementation

EricLBuehler / mistral.rs

Blazingly fast LLM inference.

MIT License

3.58k stars 254 forks source link

Closed EricLBuehler closed 4 months ago

EricLBuehler commented 4 months ago

As described. The speculative decoding implementation is working, but should be sped up.

EricLBuehler commented 4 months ago

Work in #296.