So how many GPUs do you use for inference 33B model?

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

https://arxiv.org/pdf/2406.16858

Apache License 2.0

780 stars 79 forks source link

So how many GPUs do you use for inference 33B model? #1

Closed YixinSong-e closed 9 months ago

YixinSong-e commented 9 months ago

Amazing Work! I want to know that how many GPUs do you use for inference 33B model?

Liyuhui-12 commented 9 months ago

Thank you for your interest. The GPU requirements for the 33B model depend on the precision used. With FP16 precision, the weights of the 33B model require 66GB of VRAM, and additional storage is needed for attention and other components. For vanilla autoregressive generation of long sequences, 4xRTX 3090 GPUs are needed, and EAGLE also uses 4xRTX 3090 GPUs for inference.