KV cache loading shouldn't block other requests

LMCache / LMCache

Prefill LLMs only once, re-use KV across instances

https://lmcache.ai/

Apache License 2.0

170 stars 18 forks source link

Open YaoJiayi opened 3 weeks ago

YaoJiayi commented 3 weeks ago

We should find scheduling algorithm to reduce GPU idle time.

YaoJiayi commented 2 weeks ago

Plan to solve it with an ochestrator.