LMCache / LMCache

Prefill LLMs only once, re-use KV across instances
https://lmcache.ai/
Apache License 2.0
170 stars 18 forks source link

KV cache loading shouldn't block other requests #139

Open YaoJiayi opened 3 weeks ago

YaoJiayi commented 3 weeks ago

We should find scheduling algorithm to reduce GPU idle time.

YaoJiayi commented 2 weeks ago

Plan to solve it with an ochestrator.