Equationliu / Kangaroo

Implementation of Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
https://arxiv.org/abs/2404.18911
39 stars 5 forks source link

why warmup when evaluating #1

Closed EganGu closed 4 months ago

EganGu commented 4 months ago

I noticed that when evaluating, the called func get_model_answers will first run 3 times infer for questions[0]. Is this a redundant step? or just for debugging?

https://github.com/Equationliu/Kangaroo/blob/8f7802bfd88192f031e7e7f0587b3651771bba68/evaluation/eval.py#L92

Equationliu commented 4 months ago

I noticed that when evaluating, the called func get_model_answers will first run 3 times infer for questions[0]. Is this a redundant step? or just for debugging?

https://github.com/Equationliu/Kangaroo/blob/8f7802bfd88192f031e7e7f0587b3651771bba68/evaluation/eval.py#L92

We follow the evaluation toolkit of Spec-Bench, see https://github.com/hemingkx/Spec-Bench/blob/487adbe61f676bf453ec96667489d6cfaeca182f/evaluation/eval.py#L93.

Generally, performing a warmup before calculating the end-to-end inference time could make the latency more stable.

EganGu commented 4 months ago

I noticed that when evaluating, the called func get_model_answers will first run 3 times infer for questions[0]. Is this a redundant step? or just for debugging? https://github.com/Equationliu/Kangaroo/blob/8f7802bfd88192f031e7e7f0587b3651771bba68/evaluation/eval.py#L92

We follow the evaluation toolkit of Spec-Bench, see https://github.com/hemingkx/Spec-Bench/blob/487adbe61f676bf453ec96667489d6cfaeca182f/evaluation/eval.py#L93.

Generally, performing a warmup before calculating the end-to-end inference time could make the latency more stable.

That makes sense. Thanks for your quick reply.