OOM - Githubissues

AI4Finance-Foundation / FinGPT

FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.

MIT License

13.48k stars 1.88k forks source link

Hi, phalexo. Maybe you should check your strategy for loading the model. Lets's take this notebook as an example.

Usually, you can load the model in this way:

model = LlamaForCausalLM.from_pretrained(
                         base_model, 
                         trust_remote_code = True, 
                         device_map = "cuda:0", 
                        )

However, this code is only for loading the model into a single GPU (first one) with FP16/FP32/BF16. If you want to load the model into more GPUs. You can change device_map to auto. If it doesn't work or the allocation of VRAM is not balanced, you may also set the device_map to balanced. And if you want to use quantization, you may set the hyperparameter load_in_8bit = True or load_in_4bit = True. So here comes the recommended version for multi-GPUs with quantization:

model = LlamaForCausalLM.from_pretrained(
                          base_model, 
                          trust_remote_code = True, 
                          load_in_8bit = True,
                          device_map = "auto", 
                        )

For more details you may check here and here

AI4Finance-Foundation / FinGPT

OOM #82