Closed llx-08 closed 1 week ago
Hi, I'm running the example code in flashinfer.BatchPrefillWithPagedKVCacheWrapper, but it fails at creating the attention mask. Here is the code that raise error:
flashinfer.BatchPrefillWithPagedKVCacheWrapper
mask_arr = [] qo_len = (qo_indptr[1:] - qo_indptr[:-1]).cpu().tolist() kv_len = (page_size * (paged_kv_indptr[1:] - paged_kv_indptr[:-1] - 1) + paged_kv_last_page_len).cpu().tolist() for i in range(batch_size): mask_i = torch.tril( torch.full((qo_len[i], kv_len[i]), True, device="cuda:0"), diagonal=(kv_len[i] - qo_len[i]), ) mask_arr.append(mask_i) mask = torch.cat(mask_arr, dim=0)
Is padding the mask of each requet to the same size on dimension 0 the correct way to fix this?
Thanks for reporting the bug, it looks like a typo in the docstring, please see if https://github.com/flashinfer-ai/flashinfer/pull/331 resolves your concern.
Hi, I'm running the example code in
flashinfer.BatchPrefillWithPagedKVCacheWrapper
, but it fails at creating the attention mask. Here is the code that raise error:Is padding the mask of each requet to the same size on dimension 0 the correct way to fix this?