Closed Ishiki-Iroha closed 11 months ago
We use ShareGPT dataset to train EAGLE. Here is the description of the ShareGPT dataset: Removing excessive unicode (indicative of Chinese or Korean text, usually). Therefore, acceleration on Chinese task is totally out-of-distribution.
If you are interested in Chinese task, we would suggest using Chinese corpus to train EAGLE.
We tested a small number of Chinese tasks(about 50 tasks) on Vicuna (7b, 13b) and found that the acceleration ratio of Chinese tasks was lower than that of English tasks. Is this in line with expectations? Here are some results: