Equationliu / Kangaroo

Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
https://arxiv.org/abs/2404.18911
39 stars 5 forks source link

Kangaroo when bsz is greater than 1. #6

Closed cool-xiang closed 1 week ago

cool-xiang commented 3 weeks ago

Hello, I would like to ask how Kangaroo works in scenarios where bsz is greater than 1, and which parts of the code need to be modified. Thank you!

Equationliu commented 1 week ago

The current version does not support batch decoding. Compared with other methods, kangaroo has to process the dynamic step size in the drafting process in addition to different samples are not synchronized along the batch dimension.

cool-xiang commented 1 week ago

ok, thank you!