THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.71k stars 1.85k forks source link

[BUG/Help] model = AutoModel.from_pretrained("D:\\ChatGLM\\model\\2", trust_remote_code=True).cuda() 没有报错直接退出 #613

Open zhans1099 opened 11 months ago

zhans1099 commented 11 months ago

Is there an existing issue for this?

Current Behavior

1、运行“web_demo.py”到 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True) 时没有报错 直接退出 image

2、添加.quantize(4).cuda()也不行

3、改为 model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True, device='cuda') 则会报错: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB. GPU 0 has a total capacty of 11.00 GiB of which 6.39 GiB is free. Of the allocated memory 3.06 GiB is allocated by PyTorch, and 1.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

4、nvidia-smi 信息如下

image

Expected Behavior

No response

Steps To Reproduce

新下载配置的

Environment

- OS:Windows10
- Python:3.11.3
- Transformers:4.30.2
- PyTorch:2.1.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

zhans1099 commented 11 months ago

model = AutoModel.from_pretrained("D:\ChatGLM\model\2", trust_remote_code=True).cuda() 没有报错 直接退出

dancruiser commented 11 months ago

我也遇到这个问题了。请问你解决了吗?你的显卡是?我的是 mx250

dogvane commented 11 months ago

单独下载Q4的模型看看吧。未量化的版本,我在windows下启动完成后显存占用12.5。 mx250 的就不要过来凑热闹用gpu了,跑cpu吧。