apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.67k stars 3.45k forks source link

[Runtime] Support PagedKVCache with tree attention #17049

Closed MasterJH5574 closed 4 months ago

MasterJH5574 commented 4 months ago

This PR introduces the tree attention to PagedKVCache. With this feature, now the KV cache is ready for tree attention cases such as speculative decoding trees.

This PR adds tree attention tests to test the correctness.

The changes in this PR to KVState interface are backward compatible.

tqchen commented 4 months ago

@tvm-bot rerun