[BUG/MPS] Mac M2Max F-tuning Error

NacolZero commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

Eng： Why wasn't the local model used, according to the error message? It seems that a non-local model was utilized, which introduced 'cpm_kernels' that are not supported by MacOS

中文： 使用 Mac M2Max P-tuning 异常，看提示是没使用本地模型导致引入了 MacOs 不支持的 cpm_kernels，但是没弄明白为什么没使用本地模型。

Terminal：

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:05<00:00,  1.26it/s]
[INFO|modeling_utils.py:3295] 2023-08-02 21:39:09,973 >> All model checkpoint weights were used when initializing ChatGLMForConditionalGeneration.

[WARNING|modeling_utils.py:3297] 2023-08-02 21:39:09,973 >> Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at /xxx/xxx/xxx/xxx/chatglm2-6b and are newly initialized: ['transformer.prefix_encoder.embedding.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[INFO|modeling_utils.py:2927] 2023-08-02 21:39:09,975 >> Generation config file not found, using a generation config created from the model config.
Quantized to 4 bit
08/02/2023 21:39:09 - WARNING - transformers_modules.chatglm2-6b.quantization - Failed to load cpm_kernels:Unknown platform: darwin
Traceback (most recent call last):
  File "/xxx/xxx/xxx/xxx/ChatGLM2-6B/ptuning/main.py", line 411, in <module>
    main()
  File "/xxx/xxx/xxx/xxx/ChatGLM2-6B/ptuning/main.py", line 127, in main
    model = model.quantize(model_args.quantization_bit)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 1191, in quantize
    self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/.cache/huggingface/modules/transformers_modules/chatglm2-6b/quantization.py", line 157, in quantize
    weight=layer.self_attention.query_key_value.weight.to(torch.cuda.current_device()),
                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Projects/llm/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 674, in current_device
    _lazy_init()
  File "/Users/xxx/Projects/llm/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 15295) of binary: /Users/xxx/Projects/llm/ChatGLM2-6B/venv/bin/python
Traceback (most recent call last):
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/xxx/Projects/llm/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/xxx/xxx/ChatGLM2-6B/venv/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
main.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------

Steps To Reproduce

Eng:

The deployment of ChatGML2-6B has been completed and it is working properly.
The model is named "chatglm2-6b". (PS: it isn't int4.)
In train.sh, change 'model_name_or_path' to the local model path.
Execute 'sh train.sh' command.

中文：

ChatGML2-6B 已部署，并且使用正常（正常问答）
模型名为 "chatglm2-6b" (注意：不是 int4）
在 train.sh 中，将 "model_name_or_path "改为本地模型路径
执行 "sh train.sh "命令

Environment

- OS: macos Ventura 13.2.1
- Python: 3.11
- Transformers: 4.30.2
- PyTorch: 2.0.1
- CUDA Support: False

B1ACK917 commented 1 year ago

int4量化以后的算子是cuda写的，在macos上运行要把量化参数去掉，把train.sh里的 --quantization_bit 4去掉应该就能tune了

NacolZero commented 1 year ago

int4量化以后的算子是cuda写的，在macos上运行要把量化参数去掉，把train.sh里的 --quantization_bit 4去掉应该就能tune了

感谢，去除掉 --quantization_bit 4 后确实可以了。

但是又报了其它的错误： Placeholder storage has not been allocated on MPS device!

日志信息： Process rank: 0, device: cpu, n_gpu

但是在配置和代码中没有找到可以配置 MPS device 的地方，请问您有没有什么思路？

AliThink commented 1 year ago

int4量化以后的算子是cuda写的，在macos上运行要把量化参数去掉，把train.sh里的 --quantization_bit 4去掉应该就能tune了

感谢，去除掉 --quantization_bit 4 后确实可以了。

但是又报了其它的错误： Placeholder storage has not been allocated on MPS device!

日志信息： Process rank: 0, device: cpu, n_gpu

但是在配置和代码中没有找到可以配置 MPS device 的地方，请问您有没有什么思路？

请问这个问题后来解决了吗？

NacolZero commented 1 year ago

int4量化以后的算子是cuda写的，在macos上运行要把量化参数去掉，把train.sh里的 --quantization_bit 4去掉应该就能tune了

感谢，去除掉 --quantization_bit 4 后确实可以了。但是又报了其它的错误： Placeholder storage has not been allocated on MPS device! 日志信息： Process rank: 0, device: cpu, n_gpu 但是在配置和代码中没有找到可以配置 MPS device 的地方，请问您有没有什么思路？

请问这个问题后来解决了吗？

我买了张显卡

THUDM / ChatGLM2-6B