[Bug]: PaddleNLP predict/predictor.py 出现 AttributeError: 'LlamaConfig' object has no attribute 'use_fast_layer_norm'

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 3.0.0b1
- paddlenlp: 3.0.0b0.post20240814

重复问题

[X] I have searched the existing issues

错误描述

以动态图方式执行推理，出现属性报错：

2024-08-14 10:09:54,711] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-13b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2024-08-14 10:09:54,716] [    INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-13b/generation_config.json
[2024-08-14 10:09:55,915] [    INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load 'meta-llama/Llama-2-13b'.
[2024-08-14 10:09:55,915] [    INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-13b/config.json
[2024-08-14 10:09:55,916] [    INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-13b/generation_config.json
[2024-08-14 10:09:55,919] [    INFO] - Start predict
[2024-08-14 10:09:55,931] [ WARNING] - model.generation_config is in conflict with model.config, model.config is used.
Traceback (most recent call last):
  File "/home/***/PaddleNLP/llm/predict/predictor.py", line 1665, in <module>
    predict()
  File "/home/***/PaddleLLM/upstream/PaddleNLP/llm/predict/predictor.py", line 1608, in predict
    outputs = predictor.predict(batch_source_text)
  File "/home/***/PaddleNLP/llm/predict/predictor.py", line 259, in predict
    predictions = self._infer(tokenized_source)
  File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 337, in _decorate_function
    return func(*args, **kwargs)
  File "/home/***/PaddleNLP/llm/predict/predictor.py", line 306, in _infer
    result = self.model.generate(
  File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 337, in _decorate_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/generation/utils.py", line 947, in generate
    return self.sample(
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/generation/utils.py", line 1189, in sample
    outputs = self(**model_inputs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 1977, in forward
    outputs = self.llama(
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 1689, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 1163, in forward
    hidden_states = self.input_layernorm(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 369, in forward
    hidden_states, self.weight, self.variance_epsilon, self.config.use_fast_layer_norm
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/configuration_utils.py", line 530, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'LlamaConfig' object has no attribute 'use_fast_layer_norm'

稳定复现步骤 & 代码

执行以下推理命令：

python predict/predictor.py \
    --model_name_or_path meta-llama/Llama-2-13b \
    --device gpu \
    --src_length 300 \
    --max_length 100 \
    --batch_size 4 \
    --use_flash_attention True \
    --dtype float16

错误如下：

2024-08-14 10:09:54,711] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-13b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2024-08-14 10:09:54,716] [    INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-13b/generation_config.json
[2024-08-14 10:09:55,915] [    INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load 'meta-llama/Llama-2-13b'.
[2024-08-14 10:09:55,915] [    INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-13b/config.json
[2024-08-14 10:09:55,916] [    INFO] - Loading configuration file /root/.paddlenlp/models/meta-llama/Llama-2-13b/generation_config.json
[2024-08-14 10:09:55,919] [    INFO] - Start predict
[2024-08-14 10:09:55,931] [ WARNING] - model.generation_config is in conflict with model.config, model.config is used.
Traceback (most recent call last):
  File "/home/***/PaddleNLP/llm/predict/predictor.py", line 1665, in <module>
    predict()
  File "/home/***/PaddleLLM/upstream/PaddleNLP/llm/predict/predictor.py", line 1608, in predict
    outputs = predictor.predict(batch_source_text)
  File "/home/***/PaddleNLP/llm/predict/predictor.py", line 259, in predict
    predictions = self._infer(tokenized_source)
  File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 337, in _decorate_function
    return func(*args, **kwargs)
  File "/home/***/PaddleNLP/llm/predict/predictor.py", line 306, in _infer
    result = self.model.generate(
  File "/usr/local/lib/python3.10/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 337, in _decorate_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/generation/utils.py", line 947, in generate
    return self.sample(
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/generation/utils.py", line 1189, in sample
    outputs = self(**model_inputs)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 1977, in forward
    outputs = self.llama(
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 1689, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 1163, in forward
    hidden_states = self.input_layernorm(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/paddle/nn/layer/layers.py", line 1426, in __call__
    return self.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/llama/modeling.py", line 369, in forward
    hidden_states, self.weight, self.variance_epsilon, self.config.use_fast_layer_norm
  File "/usr/local/lib/python3.10/dist-packages/paddlenlp/transformers/configuration_utils.py", line 530, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'LlamaConfig' object has no attribute 'use_fast_layer_norm'

PaddlePaddle / PaddleNLP

[Bug]: PaddleNLP predict/predictor.py 出现 AttributeError: 'LlamaConfig' object has no attribute 'use_fast_layer_norm' #8934

软件环境

重复问题

错误描述

稳定复现步骤 & 代码