Reject Sampling & Adaptive Draft-Exiting Mechanism

dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Apache License 2.0

141 stars 9 forks source link

Very interesting work, thank you for your contribution to the open source community.

In the first two papers, verification is done by probability rejection sampling, but I saw that the implementation of the code directly compares the generated ids. Are these two equivalent? https://github.com/dilab-zju/self-speculative-decoding/blob/6c719da65ada6a4cd99ea32e84030caa9aae22c0/decoding.py#L137
I saw the design of Adaptive Draft-Exiting Mechanism. What is the reason for this design? I did not find any relevant theoretical proof or intuitive explanation.

Please correct me if I'm wrong, thanks.

dilab-zju / self-speculative-decoding