Closed EganGu closed 6 months ago
I noticed that when evaluating, the called func
get_model_answers
will first run 3 times infer for questions[0]. Is this a redundant step? or just for debugging?
We follow the evaluation toolkit of Spec-Bench, see https://github.com/hemingkx/Spec-Bench/blob/487adbe61f676bf453ec96667489d6cfaeca182f/evaluation/eval.py#L93.
Generally, performing a warmup before calculating the end-to-end inference time could make the latency more stable.
I noticed that when evaluating, the called func
get_model_answers
will first run 3 times infer for questions[0]. Is this a redundant step? or just for debugging? https://github.com/Equationliu/Kangaroo/blob/8f7802bfd88192f031e7e7f0587b3651771bba68/evaluation/eval.py#L92We follow the evaluation toolkit of Spec-Bench, see https://github.com/hemingkx/Spec-Bench/blob/487adbe61f676bf453ec96667489d6cfaeca182f/evaluation/eval.py#L93.
Generally, performing a warmup before calculating the end-to-end inference time could make the latency more stable.
That makes sense. Thanks for your quick reply.
I noticed that when evaluating, the called func
get_model_answers
will first run 3 times infer for questions[0]. Is this a redundant step? or just for debugging?https://github.com/Equationliu/Kangaroo/blob/8f7802bfd88192f031e7e7f0587b3651771bba68/evaluation/eval.py#L92