CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory

root@cs:/home# ./qwen.cpp/build/bin/main -m qwen72b-ggml.bin --tiktoken qwen-72b-raw/qwen.tiktoken -i ggml_init_cublas: found 2 CUDA devices: Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0 Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory current device: 0

以上是报错信息，运行量化后的72b模型，不到40G的模型文件。一张卡80G不够，然后用两张卡，另外一个卡还没利用上就报错了。有没有大佬指点一下？

QwenLM / qwen.cpp

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55