flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.46k stars 143 forks source link

misc: remove duplicate norm cuda kernels #631

Closed yzh119 closed 13 hours ago

yzh119 commented 13 hours ago

gemma-style rmsnorm kernels (introduced in #477 ) are similar to original rmsnorm kernel, and we should use the same kernel for them. This PR cleans up duplicate code and unifies the kernels for gemma-style and original rmsnorm kernels.

The precision improvements (https://github.com/flashinfer-ai/flashinfer/pull/587, https://github.com/flashinfer-ai/flashinfer/pull/592) are kept in this PR.