SafeAILab / EAGLE

Official Implementation of EAGLE
https://arxiv.org/pdf/2406.16858
Apache License 2.0
622 stars 59 forks source link

Can Eagle improve the inference throughput in continuous batch mode? #49

Closed xiongqisong closed 3 months ago

xiongqisong commented 3 months ago

Um, I have a question for the author: In the continuous batch mode, when the computing power is already fully utilized, is there no way to further increase the throughput of inference with Eagle? Because after Eagle generates candidate tokens/sequences, it is still inevitable that the verification phase needs to call the original model. Since Eagle essentially uses the "computing power of x self-regressive heads in the generation phase" plus "the computing power consumed by the original model once" to generate multiple tokens, thereby reducing the inference latency, and as the verification still has a relatively high demand for computing power, does the increase in throughput become less significant? If using Eagle requires reducing the batch size, it may not be cost-effective. Do we need to conduct a baseline test, under continuous batch mode, without using Eagle, fully utilize the computing power, and observe the throughput; then use Eagle, fully utilize the computing power, and observe the throughput as well as how large a batch it can support?

Liyuhui-12 commented 3 months ago

We discussed this issue in Section 4.4 of our paper and the experimental results show that EAGLE increases throughput by 2x.

xiongqisong commented 3 months ago

3Q for your replay, i read Section 4.4, eagle can import throughput on batch mode, i was wandering, if eagle will decrease batch size then before? for example, when not use eagle, machine can handle 16 batch, if use eagle, the batch size decrease to 8 batch or 10 batch sth. But i think this will not happen, because eagle decrease computing power in the same batch, right?

Liyuhui-12 commented 3 months ago

The maximum batch size may slightly decrease, but the throughput still increases. This is also discussed in Section 4.4 of our paper.

xiongqisong commented 3 months ago

3Q again. Now, i understand clearly~ @Liyuhui-12