issues
search
HabanaAI
/
vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
43
stars
58
forks
source link
Add FP8 inference procedure
#504
Closed
afierka-intel
closed
1 week ago