HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
Apache License 2.0
295 stars 14 forks source link

Does it support vllm now? #18

Open skykiseki opened 1 month ago

skykiseki commented 1 month ago

The readme file mentions "widely-used libraries for memory-efficient inference such as FlashAttention and vLLM."

However, I would like to know if vLLM currently supports chunkLlama-DCA , as it seems not to be supported at the moment.