Implement prompt caching to speed up inference

Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

https://lightning.ai

Apache License 2.0

10.74k stars 1.07k forks source link

Implement prompt caching to speed up inference #1638

Open rasbt opened 3 months ago

rasbt commented 3 months ago

In addition to kv-caching, it makes sense to also add prompt caching. I.e., instead re-computing the system-prompt, we can cache these prefilled prompts to avoid recalculations

02shanks commented 3 months ago

@rasbt is this open for contribution? If so, can you guide me through?