Closed YixinSong-e closed 9 months ago
Thank you for your interest. The GPU requirements for the 33B model depend on the precision used. With FP16 precision, the weights of the 33B model require 66GB of VRAM, and additional storage is needed for attention and other components. For vanilla autoregressive generation of long sequences, 4xRTX 3090 GPUs are needed, and EAGLE also uses 4xRTX 3090 GPUs for inference.
Amazing Work! I want to know that how many GPUs do you use for inference 33B model?