question on the block table

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

BSD 3-Clause "New" or "Revised" License

14.06k stars 1.31k forks source link

https://github.com/Dao-AILab/flash-attention/blob/478ee666cccbd1b8f63648633003059a8dc6827d/tests/test_flash_attn.py#L2066

Could you elaborate more on the block table argument here? I am trying to find an example to show how flash_attn_with_kvcache should be used. Specifically, why user needs to care about the logical block/physical block mapping when passing the KV cache since it is updated in place by this function? How should we construct the block table based on the KV shape? I assume the block table shape is dependent on the KV shape only.

Dao-AILab / flash-attention

question on the block table #1314