Support with vLLM - Githubissues

datamllab / LongLM

[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

https://arxiv.org/pdf/2401.01325.pdf

MIT License

549 stars 54 forks source link

Support with vLLM #17

Open Aniketto16 opened 5 months ago

Aniketto16 commented 5 months ago

Hello! Thank you for your great work, its amazing how much hard work you put for this algorithm. I just had one question is it possible to integrate this with vLLM serving ?

This will really boost the inference time in limited resources setting once you cross the 8192 token mark, is there a way ? Thank you in advance for your help!!

Mooler0410 commented 5 months ago

We are not very familiar with vLLM and its internal mechanism. We will check its compatibility with SelfExtend. Thanks for your suggestion!

K-Mistele commented 4 months ago

+1, would love to see this in vLLM!

linchen111 commented 3 months ago

+1, would love to see this in vLLM too!

WeixuanXiong commented 2 months ago

+1, would love to see this in vLLM, since lots of online services are based on vllm! It will be so ideal if we can easily use self extend trick on our online service!