Closed briancpark closed 4 months ago
Hi, I noticed your implementations on speculative sampling implements KV-cache on Google's version, but not on Deepmind's version.
I was curious if you found a limitation with enabling KV-cache with Deepmind's version?
I found no limitations on deepmind's version. I did not implement that just for convenience.
Hi, I noticed your implementations on speculative sampling implements KV-cache on Google's version, but not on Deepmind's version.
I was curious if you found a limitation with enabling KV-cache with Deepmind's version?