THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.5k stars 5.2k forks source link

[Help] <DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.> #631

Open nor1take opened 1 year ago

nor1take commented 1 year ago

Is there an existing issue for this?

Current Behavior

报错如下:

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Traceback (most recent call last):
  File "E:\PycharmProjects\ChatGLM\cli_demo.py", line 7, in <module>
    model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\models\auto\auto_factory.py", line 466, in from_pretrained
    return model_class.from_pretrained(
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\transformers\modeling_utils.py", line 2498, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 1047, in __init__
    self.transformer = ChatGLMModel(config, empty_init=empty_init)
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in __init__
    [get_layer(layer_id) for layer_id in range(self.num_layers)]
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 844, in <listcomp>
    [get_layer(layer_id) for layer_id in range(self.num_layers)]
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 829, in get_layer
    return GLMBlock(
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 598, in __init__
    self.mlp = GLU(
  File "C:\Users\lenovo/.cache\huggingface\modules\transformers_modules\chatglm-6b-int4\modeling_chatglm.py", line 531, in __init__
    self.dense_4h_to_h = init_method(
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init
    return module_cls(*args, **kwargs).to_empty(device=final_device)
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty
    return self._apply(lambda t: torch.empty_like(t, device=device))
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\nn\modules\module.py", line 1024, in <lambda>
    return self._apply(lambda t: torch.empty_like(t, device=device))
  File "E:\PycharmProjects\ChatGLM\venv\lib\site-packages\torch\_refs\__init__.py", line 4254, in empty_like
    return torch.empty_strided(
RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 134217728 bytes.

Expected Behavior

No response

Steps To Reproduce

# cli_demo.py
tokenizer = AutoTokenizer.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("E:\ChatGLM-6B\chatglm-6b-int4", trust_remote_code=True).half().cuda()
model = model.eval()

Environment

- OS: Win 10
- Python: 3.10
- Transformers:
- PyTorch: 1.12.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true

Anything else?

内存:16GB

image

GPU:7.9GB

image

虚拟内存:已设置为“System Manage”

image

picasso250 commented 1 year ago

遇到了同样的问题,同样也是16G内存

- OS: Win 11
- Python: 3.11
- Transformers:
- PyTorch: 2.0.0+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : true
mayflyfy commented 1 year ago

。。。 i7-12700k + ROG 3080 12GB + 32G内存都跑不动

image

nor1take commented 1 year ago

@mayflyfy 属实有点离谱了奥

mayflyfy commented 1 year ago

@mayflyfy 属实有点离谱了奥

我这主机配下来1.8w,机器所有程序关完,重启机器, INT8模型刚刚能启动,开个浏览器就起不起来了,你能信。

赛博时代,没钱不配玩LLM(狗头)