[Question]How to support batch inference?

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

https://funaudiollm.github.io/

Apache License 2.0

6.47k stars 698 forks source link

[Question]How to support batch inference? #669

Open wjj19950828 opened 4 days ago

wjj19950828 commented 4 days ago

Currently, I want to support batch inference of the LLM part, but I have some questions about the final sampler and how to support contiguous batching.

I hope that llm part can be connected to the vLLM ecosystem

Can you provide some suggestions? Thank you~

wjj19950828 commented 2 days ago

@aluminumbox Do you have some suggetions? Thx~

aluminumbox commented 1 day ago

yes you can, do some modification, like keep track of index which is still decoding, we have not time to support it yet