Sizes of tensors must match except in dimension 0 when creating mask

Hi, I'm running the example code in flashinfer.BatchPrefillWithPagedKVCacheWrapper, but it fails at creating the attention mask. Here is the code that raise error:

mask_arr = []
qo_len = (qo_indptr[1:] - qo_indptr[:-1]).cpu().tolist()
kv_len = (page_size * (paged_kv_indptr[1:] - paged_kv_indptr[:-1] - 1) + paged_kv_last_page_len).cpu().tolist()
for i in range(batch_size):
    mask_i = torch.tril(
        torch.full((qo_len[i], kv_len[i]), True, device="cuda:0"),
        diagonal=(kv_len[i] - qo_len[i]),
    )
    mask_arr.append(mask_i)

mask = torch.cat(mask_arr, dim=0)

Is padding the mask of each requet to the same size on dimension 0 the correct way to fix this？

flashinfer-ai / flashinfer

Sizes of tensors must match except in dimension 0 when creating mask #330