[BUG/Help] model = AutoModel.from_pretrained("D:\\ChatGLM\\model\\2", trust_remote_code=True).cuda() 没有报错直接退出

zhans1099 commented 11 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

1、运行“web_demo.py”到 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True) 时没有报错直接退出

2、添加.quantize(4).cuda()也不行

3、改为 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True, device='cuda') 则会报错： torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacty of 11.00 GiB of which 6.39 GiB is free. Of the allocated memory 3.06 GiB is allocated by PyTorch, and 1.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

4、nvidia-smi 信息如下

Expected Behavior

No response

Steps To Reproduce

新下载配置的

Environment

- OS:Windows10
- Python:3.11.3
- Transformers:4.30.2
- PyTorch:2.1.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

zhans1099 commented 11 months ago

model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True).cuda() 没有报错直接退出

dancruiser commented 11 months ago

我也遇到这个问题了。请问你解决了吗？你的显卡是？我的是 mx250

dogvane commented 11 months ago

单独下载Q4的模型看看吧。未量化的版本，我在windows下启动完成后显存占用12.5。 mx250 的就不要过来凑热闹用gpu了，跑cpu吧。

THUDM / ChatGLM2-6B