KV-cache for Deepmind's speculative sampling

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

415 stars 46 forks source link

Closed briancpark closed 4 months ago

briancpark commented 5 months ago

Hi, I noticed your implementations on speculative sampling implements KV-cache on Google's version, but not on Deepmind's version.

I was curious if you found a limitation with enabling KV-cache with Deepmind's version?

feifeibear commented 4 months ago

I found no limitations on deepmind's version. I did not implement that just for convenience.