Does it support vllm now?

HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Apache License 2.0

295 stars 14 forks source link

Open skykiseki opened 1 month ago

skykiseki commented 1 month ago

The readme file mentions "widely-used libraries for memory-efficient inference such as FlashAttention and vLLM."

However, I would like to know if vLLM currently supports chunkLlama-DCA , as it seems not to be supported at the moment.