Closed chizhang118 closed 2 months ago
prefill and decoding disaggregation requires good performance of PP. other projects such as SwiftTransformer has PP supported. Maybe we could try to tune the PP performance
prefill and decoding disaggregation requires good performance of PP. other projects such as SwiftTransformer has PP supported. Maybe we could try to tune the PP performance
I remembered vllm had not implemented supporting PP yet. You mean adding the function of PP for vllm?
Anything you want to discuss about vllm.
Checking recent papers lists and figure out possible interesting areas to work on.