1、运行“web_demo.py”到
model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True)
时没有报错 直接退出
2、添加.quantize(4).cuda()也不行
3、改为 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True, device='cuda')
则会报错:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacty of 11.00 GiB of which 6.39 GiB is free. Of the allocated memory 3.06 GiB is allocated by PyTorch, and 1.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
4、nvidia-smi 信息如下
Expected Behavior
No response
Steps To Reproduce
新下载配置的
Environment
- OS:Windows10
- Python:3.11.3
- Transformers:4.30.2
- PyTorch:2.1.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True
Is there an existing issue for this?
Current Behavior
1、运行“web_demo.py”到 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True) 时没有报错 直接退出
2、添加.quantize(4).cuda()也不行
3、改为 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True, device='cuda') 则会报错:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacty of 11.00 GiB of which 6.39 GiB is free. Of the allocated memory 3.06 GiB is allocated by PyTorch, and 1.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
4、nvidia-smi 信息如下
Expected Behavior
No response
Steps To Reproduce
新下载配置的
Environment
Anything else?
No response