Open manucpbon opened 1 month ago
Hello, thank you so much for using our tool! Unfortunately, I don't think you can be able to run our model on your setup. One way you can reduce the memory usage is to specify "load_8bit=True" in the inference.py script. This allows to run the model with lower precision. However, based on our testing, it requires at least 10GB ~ 9536.74MiB to run the inference if you set "load_8bit=True" and 30GB for "load_8bit=False". If you have two 8GB GPUs then, the memory usage can be split and you can be able to run it.
Hi! I am trying to run the
inference.py
script with the "testing" inputs, but I'm having memory issues. I'm running it locally, on a computer with GPU NVIDIA GeForce RTX 3060 (max 8192MiB). I'm using the llama2-7B model. When I runpython inference.py -i ./testing/input/ -o ./testing/output/
I get the following messagesIs my GPU memory not enough to run this locally? Is there any way you can help me?
Thanks in advance and congratulations on the amazing paper and results!