Closed yzh119 closed 5 days ago
This PR refactors the batch decode related kernels, and make the following breaking changes:
batch_decode_with_padded_kv_cache
BatchDecodeWithPagedKVCacheWrapper
The output data type follows the query data type.
This PR refactors the batch decode related kernels, and make the following breaking changes:
batch_decode_with_padded_kv_cache
operator, we encourage user to useBatchDecodeWithPagedKVCacheWrapper
.The output data type follows the query data type.