Closed yzh119 closed 4 months ago
Trim macros so that we can configure to compile a subset of kernels in cmake.
After this PR, we can customize the kernel to compile by changing the cmake.config, for example, if we add these lines to cmake.config:
cmake.config
set(FLASHINFER_GEN_GROUP_SIZES 1 8) set(FLASHINFER_GEN_PAGE_SIZES 16) set(FLASHINFER_GEN_HEAD_DIMS 128) set(FLASHINFER_GEN_KV_LAYOUTS 0) set(FLASHINFER_GEN_POS_ENCODING_MODES 0) set(FLASHINFER_GEN_ALLOW_FP16_QK_REDUCTIONS "false") set(FLASHINFER_GEN_CASUALS "false" "true")
Then only a subset of kernels will be compiled, which could greatly reduce compilation time and reduce binary size (e.g. the tvm binding).
Trim macros so that we can configure to compile a subset of kernels in cmake.
After this PR, we can customize the kernel to compile by changing the
cmake.config
, for example, if we add these lines tocmake.config
:Then only a subset of kernels will be compiled, which could greatly reduce compilation time and reduce binary size (e.g. the tvm binding).