flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.11k stars 100 forks source link

[Feature Request] Versatile head dimension #142

Open yzh119 opened 6 months ago

yzh119 commented 6 months ago

Support more head dimensions to 64/128/256

zhuohan123 commented 6 months ago

Thanks @yzh119! Currently vLLM support the following head sizes, all multiply of 16: 64/80/96/112/128/256

jpf888 commented 6 months ago

head size 112 is need +1 ~ thanks~

jpf888 commented 6 months ago

head size 112 is need +1 ~ thanks~

@yzh119

jpf888 commented 4 months ago

@yzh119 "Hello, I would like to ask when will 'head dim 64' be supported?"