flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.37k stars 125 forks source link

Have any plans to optimize the decode kernel for NV-Hopper #576

Open JamesLim-sy opened 3 hours ago

JamesLim-sy commented 3 hours ago

I noticed hopper cluster setting may have a chance to optimize the performance of batch_decode by merging VariableLengthMergeStates with BatchDecodeWithPagedKVCacheKernel. Is there any plan to use SM90 features for it ?

zhyncs commented 3 hours ago

Is there any plan to use SM90 features for it?

ref https://github.com/flashinfer-ai/flashinfer/pull/507#issue-2547125600