SafeAILab / EAGLE

Official Implementation of EAGLE-1 and EAGLE-2
https://arxiv.org/pdf/2406.16858
Apache License 2.0
650 stars 63 forks source link

EAGLE-2 is slower than EAGlE-1 #88

Open yjdy opened 1 week ago

yjdy commented 1 week ago

Thanks for this great repo. I have test EAGLE-1 and EAGLE-2 on vicuna-7b. But I found that EAGLE-2 is slower than EAGLE-1, 69 tokens/s and 66 tokens/s respectively. The inference test is on MT-bentch, using V100 with 32G memory, and batch size is 1.

Is it normal?

Best regards

yjdy commented 1 week ago

I make a mistake above, the inference speed of EAGLE-2 is 66 and EAGLE-1 is 69. Besides, temperature is 0

hongyanz commented 1 week ago

It is not normal. Can you provide more details (e.g., if you are running something else on your machine, what is your environment, which codes are your running)? Without them, it is hard to debug your code.

yjdy commented 1 week ago

Some details of my environment are listed as follow: 1 GPU V100 32G memory python 3.10.14 CUDA 11.7 Driver Version: 515.65.01 torch 2.1.0 triton 2.1.0 transformers 4.36.2

I just run the evaluation script gen_ea_answer_vicuna.py as suggested in Readme batch size= 1 temperature=0

Liyuhui-12 commented 1 week ago

The possible reason is that total_token was not set correctly.

yjdy commented 6 days ago

Thanks for the response. Can you provide me some advise to set total_token? Should I set larger or smaller?

Lucas-TY commented 5 days ago

HI, the benchmark can't record new token correctly, I don't know if that's normal.

python -m eagle.evaluation.gen_ea_answer_vicuna\
        --ea-model-path yuhuili/EAGLE-Vicuna-7B-v1.3\
        --base-model-path lmsys/vicuna-7b-v1.3

python -m eagle.evaluation.gen_baseline_answer_vicuna\
         --ea-model-path yuhuili/EAGLE-Vicuna-7B-v1.3\
         --base-model-path lmsys/vicuna-7b-v1.3
{"question_id": 81, "answer_id": "TP4CRrbLYBqFHdQqoeb7ug", "model_id": "ess-vicuna-70b-fp16-baseline-temperature-1.0", "choices": [{"index": 0, "turns": ["....... "idxs": [603, 603], "new_tokens": [0, 0], "wall_time": [8.09636378288269, 7.946403741836548]}], "tstamp": 1720166094.253764}
Liyuhui-12 commented 17 hours ago

Thanks for the response. Can you provide me some advise to set total_token? Should I set larger or smaller?

Overall, the smaller the model and the more powerful the computational capacity, the larger this value should be.

Liyuhui-12 commented 16 hours ago

HI, the benchmark can't record new token correctly, I don't know if that's normal.

It is normal for the baseline not to return new tokens.