Is your feature request related to a problem? Please describe.
根据文档得知未量化模型推理只需要13G显存,
机器16G内存,22G显存,Docker容器内加载未量化模型,提示如下
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 214.00 MiB. GPU 0 has a total capacty of 22.00 GiB of which 19.50 GiB is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 1.32 GiB is allocated by PyTorch, and 1.93 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is your feature request related to a problem? Please describe.
根据文档得知未量化模型推理只需要13G显存,
机器16G内存,22G显存,Docker容器内加载未量化模型,提示如下
Solutions
22显存加载未量化模型
Additional context
No response