THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
Other
15.74k stars 1.85k forks source link

[BUG/MPS] Mac M2Max F-tuning Error #417

Open NacolZero opened 1 year ago

NacolZero commented 1 year ago

Is there an existing issue for this?

Current Behavior

Eng: Why wasn't the local model used, according to the error message? It seems that a non-local model was utilized, which introduced 'cpm_kernels' that are not supported by MacOS

中文: 使用 Mac M2Max P-tuning 异常, 看提示是没使用本地模型导致引入了 MacOs 不支持的 cpm_kernels, 但是没弄明白为什么没使用本地模型。

Terminal:

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:05<00:00,  1.26it/s]
[INFO|modeling_utils.py:3295] 2023-08-02 21:39:09,973 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[WARNING|modeling_utils.py:3297] 2023-08-02 21:39:09,973 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /xxx/xxx/xxx/xxx/chatglm2-6b and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2927] 2023-08-02 21:39:09,975 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
08/02/2023 21:39:09 - WARNING - transformers_modules.chatglm2-6b.quantization - Failed to load cpm_kernels:Unknown platform: darwin
Traceback (most recent call last):
  File "/xxx/xxx/xxx/xxx/ChatGLM2-6B/ptuning/main.py", line 411, in <module>
    main()
  File "/xxx/xxx/xxx/xxx/ChatGLM2-6B/ptuning/main.py", line 127, in main
    model = model.quantize(model_args.quantization_bit)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 1191, in quantize
    self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/quantization.py", line 157, in quantize
    weight=layer.self_attention.query_key_value.weight.to(torch.cuda.current_device()),
                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Projects/llm/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 674, in current_device
    _lazy_init()
  File "/Users/xxx/Projects/llm/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 15295) of binary: /Users/xxx/Projects/llm/ChatGLM2-6B/venv/bin/python
Traceback (most recent call last):
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/xxx/Projects/llm/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------

Steps To Reproduce

Eng:

  1. The deployment of ChatGML2-6B has been completed and it is working properly.
  2. The model is named "chatglm2-6b". (PS: it isn't int4.)
  3. In train.sh, change 'model_name_or_path' to the local model path.
  4. Execute 'sh train.sh' command.

中文:

  1. ChatGML2-6B 已部署,并且使用正常(正常问答)
  2. 模型名为 "chatglm2-6b" (注意:不是 int4)
  3. 在 train.sh 中,将 "model_name_or_path "改为本地模型路径
  4. 执行 "sh train.sh "命令

Environment

- OS: macos Ventura 13.2.1
- Python: 3.11
- Transformers: 4.30.2
- PyTorch: 2.0.1
- CUDA Support: False
B1ACK917 commented 1 year ago

int4量化以后的算子是cuda写的,在macos上运行要把量化参数去掉,把train.sh里的 --quantization_bit 4去掉应该就能tune了

NacolZero commented 1 year ago

int4量化以后的算子是cuda写的,在macos上运行要把量化参数去掉,把train.sh里的 --quantization_bit 4去掉应该就能tune了

感谢,去除掉 --quantization_bit 4 后确实可以了。

但是又报了其它的错误 : Placeholder storage has not been allocated on MPS device!

日志信息: Process rank: 0, device: cpu, n_gpu

但是在配置和代码中没有找到可以配置 MPS device 的地方,请问您有没有什么思路?

AliThink commented 1 year ago

int4量化以后的算子是cuda写的,在macos上运行要把量化参数去掉,把train.sh里的 --quantization_bit 4去掉应该就能tune了

感谢,去除掉 --quantization_bit 4 后确实可以了。

但是又报了其它的错误 : Placeholder storage has not been allocated on MPS device!

日志信息: Process rank: 0, device: cpu, n_gpu

但是在配置和代码中没有找到可以配置 MPS device 的地方,请问您有没有什么思路?

请问这个问题后来解决了吗?

NacolZero commented 1 year ago

int4量化以后的算子是cuda写的,在macos上运行要把量化参数去掉,把train.sh里的 --quantization_bit 4去掉应该就能tune了

感谢,去除掉 --quantization_bit 4 后确实可以了。 但是又报了其它的错误 : Placeholder storage has not been allocated on MPS device! 日志信息: Process rank: 0, device: cpu, n_gpu 但是在配置和代码中没有找到可以配置 MPS device 的地方,请问您有没有什么思路?

请问这个问题后来解决了吗?

我买了张显卡