Open JamesLim-sy opened 3 hours ago
I noticed hopper cluster setting may have a chance to optimize the performance of batch_decode by merging VariableLengthMergeStates with BatchDecodeWithPagedKVCacheKernel. Is there any plan to use SM90 features for it ?
VariableLengthMergeStates
BatchDecodeWithPagedKVCacheKernel
Is there any plan to use SM90 features for it?
ref https://github.com/flashinfer-ai/flashinfer/pull/507#issue-2547125600
I noticed hopper cluster setting may have a chance to optimize the performance of batch_decode by merging
VariableLengthMergeStates
withBatchDecodeWithPagedKVCacheKernel
. Is there any plan to use SM90 features for it ?