QwenLM / qwen.cpp

C++ implementation of Qwen-LM
Other
506 stars 40 forks source link

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory #55

Open youngallien opened 7 months ago

youngallien commented 7 months ago

root@cs:/home# ./qwen.cpp/build/bin/main -m qwen72b-ggml.bin --tiktoken qwen-72b-raw/qwen.tiktoken -i ggml_init_cublas: found 2 CUDA devices: Device 0: NVIDIA A800 80GB PCIe, compute capability 8.0 Device 1: NVIDIA A800 80GB PCIe, compute capability 8.0

CUDA error 2 at /home/qwen.cpp/third_party/ggml/src/ggml-cuda.cu:7196: out of memory current device: 0


以上是报错信息,运行量化后的72b模型,不到40G的模型文件。一张卡80G不够,然后用两张卡,另外一个卡还没利用上就报错了。 有没有大佬指点一下?