flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving
https://flashinfer.ai
Apache License 2.0
1.09k stars 96 forks source link

Shared-prefix rope issue #194

Open lkc1997 opened 5 months ago

lkc1997 commented 5 months ago

image I found that during shared-prefix calculation, this kenerl won't use _qoindptr to split batch queries which may cause rope error.

yzh119 commented 3 months ago

Hi @lkc1997 , we have another few arguments (rope_position to specify the rope position of each query) in our C++ APIs but we have ported them into PyTorch APIs, I'll do that in next release.

You can process rope outside attention and use attention kernel without rope at the moment.