Closed w32zhong closed 4 months ago
It is recommended to check whether the environment of transformers is consistent with ours. The result does not need to average all the rows, just get the last output row directly. The following are the results of our recent tests of rouge's various standards: data 999,{'mean rouge-2 base': '0.1062', 'mean rouge-2 essg autoth': '0.1078', 'mean rouge-1 base': '0.2628', 'mean rouge-1 essg autoth': '0.2660', 'mean rouge-L base': '0.1806', 'mean rouge-L essg autoth': '0.1831', 'mean time base': '26.0185', 'mean time essg autoth': '16.5029', 'E2E mean speed up essg autoth': '1.5766', 'mean token time base': '0.0508', 'mean token time essg autoth': '0.0322', 'E2E mean token speed up essg autoth': '1.5766', 'mean matchness essg autoth': '0.9187', 'mean num_drafted_tokens essg autoth': '461.6900'}
Hi, I am trying to replicate results by running the
evaluate_sum.ipynb
notebook.Here is what I get:
To my understanding, this method should achieve at maximum 1.5 speedup for a 7b model? Does that mean I have to average all data rows to get that number?
Thanks in advance.