feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding
Apache License 2.0
530 stars 51 forks source link

use kv cache for approx model generates #3

Closed feifeibear closed 1 year ago