flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
768 stars 64 forks source link

perf: add fastdiv for uint32_t #278

Closed yzh119 closed 1 month ago

yzh119 commented 1 month ago

262 will degrade performance because divide by group_size will be slow.

This PR adds fastdiv support so that we can use faster shift and ifma operations to compute division.

We copied the code from figure 10-2 in Hacker's Delight version 2. The data structure was inspired by https://github.com/milakov/int_fastdiv/ project, we kept its license.