flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
760 stars 64 forks source link

Sizes of tensors must match except in dimension 0 when creating mask #330

Closed llx-08 closed 1 week ago

llx-08 commented 1 week ago

Hi, I'm running the example code in flashinfer.BatchPrefillWithPagedKVCacheWrapper, but it fails at creating the attention mask. Here is the code that raise error:

mask_arr = []
qo_len = (qo_indptr[1:] - qo_indptr[:-1]).cpu().tolist()
kv_len = (page_size * (paged_kv_indptr[1:] - paged_kv_indptr[:-1] - 1) + paged_kv_last_page_len).cpu().tolist()
for i in range(batch_size):
    mask_i = torch.tril(
        torch.full((qo_len[i], kv_len[i]), True, device="cuda:0"),
        diagonal=(kv_len[i] - qo_len[i]),
    )
    mask_arr.append(mask_i)

mask = torch.cat(mask_arr, dim=0)

Is padding the mask of each requet to the same size on dimension 0 the correct way to fix this?

yzh119 commented 1 week ago

Thanks for reporting the bug, it looks like a typo in the docstring, please see if https://github.com/flashinfer-ai/flashinfer/pull/331 resolves your concern.