flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
760 stars 64 forks source link

How large the page_size could be? #320

Closed llx-08 closed 2 weeks ago

llx-08 commented 2 weeks ago

Hello, I encountered this error:

RuntimeError: BatchPrefillWithPagedKVCachePyTorchWrapper::Forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, 
at::Tensor, bool, unsigned int, bool, float, float, float, bool)::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()>::
<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()>::<lambda()> 
failed to dispatch page size 512

when running BatchPrefillWithPagedKVCacheWrapper.begin_forward on A100 after setting the page_size to 512. I am curious about the maximum allowable page_size and the factors that influence this limitation. Any information would be greatly appreciated. Thank you!

yzh119 commented 2 weeks ago

Hi, @llx-08 , the restriction of page size was removed in v0.0.5. The wheels should be available in a few hours (https://github.com/flashinfer-ai/flashinfer/actions/runs/9594555333).

llx-08 commented 2 weeks ago

Hi, @llx-08 , the restriction of page size was removed in v0.0.5. The wheels should be available in a few hours (https://github.com/flashinfer-ai/flashinfer/actions/runs/9594555333).

Oh, thank you! By the way, I ran some tests with last_page_len > page_size, it didn't raise any errors. I'm unsure if this is potentially a hidden bug...

yzh119 commented 2 weeks ago

We didn't check the len of last_page_len in the kernel implementation, it's a undefined behavior.

yzh119 commented 2 weeks ago

v0.0.5 was release, I'll close this PR for now.