Some speculative decoding algorithms requires tree attention, which could be supported via prefill/append attention kernels with custom attention mask.
This PR supports this feature.
Related issues: #152
API Breaking Changes
The begin_forward function in BatchPrefillWithPagedKVCacheWrapper now has an additional argument page_size to accomodate this new feature.
Some speculative decoding algorithms requires tree attention, which could be supported via prefill/append attention kernels with custom attention mask.
This PR supports this feature.
Related issues: #152
API Breaking Changes
The
begin_forward
function inBatchPrefillWithPagedKVCacheWrapper
now has an additional argumentpage_size
to accomodate this new feature.