EmbeddedLLM / vllm-rocm

vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
https://vllm.readthedocs.io
Apache License 2.0
83 stars 5 forks source link

[Do Not Merge] Change Highlight to Prepare Merging into vLLM Main #13

Closed tjtanaa closed 7 months ago

fxmarty commented 7 months ago

Hi @tjtanaa, I am wondering if the work being done in this repo is different (kernel-wise) from https://github.com/vllm-project/vllm/pull/1313?

Thank you!

kliuae commented 6 months ago

Hi @fxmarty the kernels in the v0.2.x ports were built upon vllm-project#1313 with some modifications for them to build in our environments, as well as the inclusion of squeezellm quantization kernels. Thank you

fxmarty commented 6 months ago

I see thank you!