Open je1lee opened 5 months ago
Is there anyway to batch generate with paged kv cache in current state? If it doesn't, do you have plan for it?
Tasks like Site Recognition, Classification, Counting object using general knowledge of LLM would be done faster with batch generation supports.
Is there anyway to batch generate with paged kv cache in current state? If it doesn't, do you have plan for it?
Tasks like Site Recognition, Classification, Counting object using general knowledge of LLM would be done faster with batch generation supports.