dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
Apache License 2.0
117 stars 8 forks source link

Unable to get 1.5 speedup using 13B model? #12

Closed w32zhong closed 4 months ago

w32zhong commented 5 months ago

Hi, I am trying to replicate results by running the evaluate_sum.ipynb notebook.

Here is what I get:

'E2E mean token speed up essg th 0.2': '0.6583', 
'E2E mean token speed up essg th 0.4': '0.4406',
'E2E mean token speed up essg th 0.6': '1.0358',
'E2E mean token speed up essg th 0.8': '0.7673',
'E2E mean token speed up essg autoth': '1.0797'.

To my understanding, this method should achieve at maximum 1.5 speedup for a 7b model? Does that mean I have to average all data rows to get that number?

Thanks in advance.

junzhang-zj commented 5 months ago

It is recommended to check whether the environment of transformers is consistent with ours. The result does not need to average all the rows, just get the last output row directly. The following are the results of our recent tests of rouge's various standards: data 999,{'mean rouge-2 base': '0.1062', 'mean rouge-2 essg autoth': '0.1078', 'mean rouge-1 base': '0.2628', 'mean rouge-1 essg autoth': '0.2660', 'mean rouge-L base': '0.1806', 'mean rouge-L essg autoth': '0.1831', 'mean time base': '26.0185', 'mean time essg autoth': '16.5029', 'E2E mean speed up essg autoth': '1.5766', 'mean token time base': '0.0508', 'mean token time essg autoth': '0.0322', 'E2E mean token speed up essg autoth': '1.5766', 'mean matchness essg autoth': '0.9187', 'mean num_drafted_tokens essg autoth': '461.6900'}