feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding
415 stars 46 forks source link

remove past_key_values usages, because it will lead to wrong answers #5

Closed feifeibear closed 9 months ago