Closed nandor closed 4 days ago
The cascade kernels can take a dynamic sequence length in order to allow the number of tokens to vary when executed under CUDA graphs.
This is the first step towards implementing CUDA graph support for arbitrary qo_indptr contents, as tracked by #626.
qo_indptr
The cascade kernels can take a dynamic sequence length in order to allow the number of tokens to vary when executed under CUDA graphs.
This is the first step towards implementing CUDA graph support for arbitrary
qo_indptr
contents, as tracked by #626.