Open skykiseki opened 1 month ago
The readme file mentions "widely-used libraries for memory-efficient inference such as FlashAttention and vLLM."
However, I would like to know if vLLM currently supports chunkLlama-DCA , as it seems not to be supported at the moment.
The readme file mentions "widely-used libraries for memory-efficient inference such as FlashAttention and vLLM."
However, I would like to know if vLLM currently supports chunkLlama-DCA , as it seems not to be supported at the moment.