dusty-nv / NanoLLM

Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector DB, and RAG.
https://dusty-nv.github.io/NanoLLM/
MIT License
196 stars 31 forks source link

batch generation is available?? #21

Open je1lee opened 4 months ago

je1lee commented 4 months ago

Is there anyway to batch generate with paged kv cache in current state? If it doesn't, do you have plan for it?

Tasks like Site Recognition, Classification, Counting object using general knowledge of LLM would be done faster with batch generation supports.