Closed VladAndronik closed 10 months ago
Unfortunately YES. Emu2 is a model with 37 billion parameters, requiring approximately 138GB of memory under float32 precision, due to its sheer size. The huggingface version of Emu2-Chat and Emu2 are stored under the float32 precision.
We have just released the native PyTorch version of models in bf16 precision, requiring only 70GB of memory. You can try it out by following the instruction.
p.s. The native PyTorch version model is not compatible with the huggingface version. Please use the latest codes in this repo to load them.
Thank you, so I would also need 70GB of video memory? Would it be possible to load it into 24GB with quantization?
Please follow the quantization instruction. This requires approximately 22GB RAM and 22GB VRAM.
Did you investigate if it affects performance too much, additional hallucinations?
Regarding the impact of quantization on performance, we have not conducted thorough verification. Based on a few cases we have tested, the quantized model does not output as detailed answers as the original model. However, overall, the outputs are still correct.
Thank you for the response!
Trying to run your demo, but getting No space left on device error, while loading the model, it takes more than 60GB of memory, on your HF it seems like there are 15 10GB files, are they all needed?
Thanks!