THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.68k stars 1.85k forks source link

Docker容器内推理显存占用问题 #556

Open qingfengfenga opened 12 months ago

qingfengfenga commented 12 months ago

Is your feature request related to a problem? Please describe.

根据文档得知未量化模型推理只需要13G显存,

机器16G内存,22G显存,Docker容器内加载未量化模型,提示如下

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB. GPU 0 has a total capacty of 22.00 GiB of which 19.50 GiB is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 1.32 GiB is allocated by PyTorch, and 1.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Solutions

22显存加载未量化模型

Additional context

No response