flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.22k stars 115 forks source link

cmake: macro trimming #235

Closed yzh119 closed 4 months ago

yzh119 commented 5 months ago

Trim macros so that we can configure to compile a subset of kernels in cmake.

After this PR, we can customize the kernel to compile by changing the cmake.config, for example, if we add these lines to cmake.config:

set(FLASHINFER_GEN_GROUP_SIZES 1 8)
set(FLASHINFER_GEN_PAGE_SIZES 16)
set(FLASHINFER_GEN_HEAD_DIMS 128)
set(FLASHINFER_GEN_KV_LAYOUTS 0)
set(FLASHINFER_GEN_POS_ENCODING_MODES 0)
set(FLASHINFER_GEN_ALLOW_FP16_QK_REDUCTIONS "false")
set(FLASHINFER_GEN_CASUALS "false" "true")

Then only a subset of kernels will be compiled, which could greatly reduce compilation time and reduce binary size (e.g. the tvm binding).