Open wjj19950828 opened 4 days ago
Currently, I want to support batch inference of the LLM part, but I have some questions about the final sampler and how to support contiguous batching.
I hope that llm part can be connected to the vLLM ecosystem
Can you provide some suggestions? Thank you~
@aluminumbox Do you have some suggetions? Thx~
yes you can, do some modification, like keep track of index which is still decoding, we have not time to support it yet
Currently, I want to support batch inference of the LLM part, but I have some questions about the final sampler and how to support contiguous batching.
I hope that llm part can be connected to the vLLM ecosystem
Can you provide some suggestions? Thank you~