Duplicate of #75, but re-based on the main branch.
Note that to support CUDAGraph, we cannot make kv_chunk_size a function argument, which will be passed by value, and cannot change once captured by CUDAGraph. Instead, we pass kv_chunk_size through a kv_chunk_size_ptr which is a pointer to a global memory address that stores the kv_chunk_size, its value can be set in BeginForward fuctions.
Duplicate of #75, but re-based on the main branch.
Note that to support CUDAGraph, we cannot make
kv_chunk_size
a function argument, which will be passed by value, and cannot change once captured by CUDAGraph. Instead, we passkv_chunk_size
through akv_chunk_size_ptr
which is a pointer to a global memory address that stores thekv_chunk_size
, its value can be set inBeginForward
fuctions.