PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.72k stars 2.86k forks source link

RuntimeError: (NotFound) The kernel with key (CPU, Undefined(AnyLayout), float16) of kernel `multiply` is not registered. #8685

Closed zhanland closed 2 days ago

zhanland commented 3 days ago

软件环境

paddle2onnx        1.2.4
paddlefsl          1.1.0
paddlenlp          3.0.0b0
paddlepaddle       3.0.0b0

重复问题

错误描述

/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2024-06-30 00:30:54,539] [ WARNING] - if you run ring_flash_attention.py, please ensure you install the paddlenlp_ops by following the instructions provided at https://github.com/PaddlePaddle/PaddleNLP/blob/develop/csrc/README.md
[2024-06-30 00:30:56,533] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.tokenizer.Qwen2Tokenizer'> to load 'Qwen/Qwen2-0.5B'.
[2024-06-30 00:30:56,695] [    INFO] - The `unk_token` parameter needs to be defined: we use `eos_token` by default.
[2024-06-30 00:30:56,890] [    INFO] - Adding <|im_start|> to the vocabulary
[2024-06-30 00:30:56,890] [    INFO] - Adding <|im_end|> to the vocabulary
[2024-06-30 00:30:56,890] [    INFO] - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-06-30 00:30:56,891] [    INFO] - We are using <class 'paddlenlp.transformers.qwen2.modeling.Qwen2ForCausalLM'> to load 'Qwen/Qwen2-0.5B'.
[2024-06-30 00:30:56,891] [    INFO] - Loading configuration file /home/liang/.paddlenlp/models/Qwen/Qwen2-0.5B/config.json
[2024-06-30 00:30:56,892] [    INFO] - Loading weights file from cache at /home/liang/.paddlenlp/models/Qwen/Qwen2-0.5B/model.safetensors
[2024-06-30 00:30:57,622] [    INFO] - Loaded weights file from disk, setting weights to model.
[2024-06-30 00:31:40,997] [    INFO] - All model checkpoint weights were used when initializing Qwen2ForCausalLM.

[2024-06-30 00:31:41,001] [ WARNING] - Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at Qwen/Qwen2-0.5B and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2024-06-30 00:31:41,002] [    INFO] - Loading configuration file /home/liang/.paddlenlp/models/Qwen/Qwen2-0.5B/generation_config.json
W0630 00:31:41.038244 50710 multiply_fwd_func.cc:75] got different data type, run type promotion automatically, this may cause data type been changed.
Traceback (most recent call last):
  File "/home/liang/paddle/main.py", line 5, in <module>
    outputs = model.generate(**input_features, max_length=128)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddle/base/dygraph/base.py", line 337, in _decorate_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddlenlp/generation/utils.py", line 918, in generate
    return self.greedy_search(
           ^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddlenlp/generation/utils.py", line 1072, in greedy_search
    outputs = self(**model_inputs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddlenlp/transformers/qwen2/modeling.py", line 1337, in forward
    outputs = self.qwen2(
              ^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddlenlp/transformers/qwen2/modeling.py", line 1068, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddlenlp/transformers/qwen2/modeling.py", line 655, in forward
    hidden_states = self.input_layernorm(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/liang/miniconda3/envs/paddle_cpu/lib/python3.11/site-packages/paddlenlp/transformers/qwen2/modeling.py", line 304, in forward
    return hidden_states * self.weight
           ~~~~~~~~~~~~~~^~~~~~~~~~~~~
RuntimeError: (NotFound) The kernel with key (CPU, Undefined(AnyLayout), float16) of kernel `multiply` is not registered. Selected wrong DataType `float16`. Paddle support following DataTypes: int64, float64, complex128, float32, complex64, int32, bfloat16, bool.
  [Hint: Expected kernel_iter == iter->second.end() && kernel_key.backend() == Backend::CPU != true, but received kernel_iter == iter->second.end() && kernel_key.backend() == Backend::CPU:1 == true:1.] (at /paddle/paddle/phi/core/kernel_factory.cc:287)

稳定复现步骤 & 代码

from paddlenlp.transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B",dtype='float16') input_features = tokenizer("你好!请自我介绍一下。", return_tensors="pd") outputs = model.generate(**input_features, max_length=128) tokenizer.batch_decode(outputs[0])