[Bug]: RUN python export_model.py --model_name_or_path meta-llama/Llama-2-7b --output_path ./llama2-7b-static --dtype float16 --inference_model

Rane2021 commented 1 year ago

软件环境

- paddlepaddle-gpu: 2.5.1
- paddlenlp: 2.6.1

重复问题

[X] I have searched the existing issues

错误描述

运行：
python export_model.py     --model_name_or_path meta-llama/Llama-2-7b     --output_path ./llama2-7b-static     --dtype float16 
报错：
ValueError: The decorated function `generate` requires 4 arguments: ['input_ids', 'generation_config', 'stopping_criteria', 'streamer'], but received 26 with (InputSpec(shape=(-1, -1), dtype=paddle.int64, name=None, stop_gradient=False), InputSpec(shape=(-1, -1), dtype=paddle.int64, name=None, stop_gradient=False), None, InputSpec(shape=(1,), dtype=paddle.int64, name=None, stop_gradient=False), 0, 'sampling', InputSpec(shape=(1,), dtype=paddle.float32, name=None, stop_gradient=False), 0, InputSpec(shape=(1,), dtype=paddle.float32, name=None, stop_gradient=False), 1, 1, 1, 0.0, False, 0, 0, 0, None, None, None, None, 1, 0.0, True, False, False).

运行：
python export_model.py     --model_name_or_path meta-llama/Llama-2-7b     --output_path ./llama2-7b-static     --dtype float16 --inference_model
报错：
ImportError: cannot import name 'fused_layer_norm' from 'paddle.incubate.nn.functional' (/usr/local/lib/python3.10/dist-packages/paddle/incubate/nn/functional/__init__.py)

稳定复现步骤 & 代码

python export_model.py --model_name_or_path meta-llama/Llama-2-7b --output_path ./llama2-7b-static --dtype float16 --inference_model

zhuantouer commented 11 months ago

+1

~/opt/py38/bin/python export_model.py --model_name_or_path FlagAlpha/Llama2-Chinese-7b-Chat --output_path ./inference --dtype float16
/root/opt/py38/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2023-12-13 16:27:41,659] [    INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/tokenizer_config.json
[2023-12-13 16:27:41,660] [    INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load 'FlagAlpha/Llama2-Chinese-7b-Chat'.
[2023-12-13 16:27:41,660] [    INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/sentencepiece.bpe.model
[2023-12-13 16:27:41,660] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/FlagAlpha/Llama2-Chinese-7b-Chat/added_tokens.json and saved to /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat
[2023-12-13 16:27:41,819] [ WARNING] - file<https://bj.bcebos.com/paddlenlp/models/community/FlagAlpha/Llama2-Chinese-7b-Chat/added_tokens.json> not exist
[2023-12-13 16:27:41,821] [    INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/special_tokens_map.json
[2023-12-13 16:27:41,821] [    INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/tokenizer_config.json
[2023-12-13 16:27:41,862] [   ERROR] - Using pad_token, but it is not set yet.
[2023-12-13 16:27:41,999] [    INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:27:42,002] [    INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load 'FlagAlpha/Llama2-Chinese-7b-Chat'.
[2023-12-13 16:27:42,142] [    INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:27:42,143] [    INFO] - Loading configuration file /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:27:42,446] [    INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/model_state.pdparams
[2023-12-13 16:27:42,447] [    INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/model_state.pdparams
[2023-12-13 16:29:37,609] [    INFO] - Loaded weights file from disk, setting weights to model.
W1213 16:29:37.621098 125450 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.4, Runtime API Version: 11.2
W1213 16:29:37.625649 125450 gpu_resources.cc:149] device: 0, cuDNN Version: 8.1.
[2023-12-13 16:30:17,304] [    INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.

[2023-12-13 16:30:17,305] [    INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at FlagAlpha/Llama2-Chinese-7b-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2023-12-13 16:30:17,449] [    INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/generation_config.json
[2023-12-13 16:30:17,452] [    INFO] - Loading configuration file /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/generation_config.json
/root/opt/py38/lib/python3.8/site-packages/paddlenlp/generation/configuration_utils.py:247: UserWarning: using greedy search strategy. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/root/opt/py38/lib/python3.8/site-packages/paddlenlp/generation/configuration_utils.py:252: UserWarning: using greedy search strategy. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
[2023-12-13 16:30:18,230] [    INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:30:18,231] [    INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load 'FlagAlpha/Llama2-Chinese-7b-Chat'.
[2023-12-13 16:30:18,376] [    INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:30:18,378] [    INFO] - Loading configuration file /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
/root/opt/py38/lib/python3.8/site-packages/paddle/jit/api.py:944: UserWarning: What you save is a function, and `jit.save` will generate the name of the model file according to `path` you specify. When loading these files with `jit.load`, you get a `TranslatedLayer` whose inference result is the same as the inference result of the function you saved.
  warnings.warn(
Traceback (most recent call last):
  File "export_model.py", line 96, in <module>
    main()
  File "export_model.py", line 84, in main
    predictor.model.to_static(
  File "/root/opt/py38/lib/python3.8/site-packages/paddlenlp/generation/utils.py", line 1326, in to_static
    paddle.jit.save(model, path)
  File "/root/opt/py38/lib/python3.8/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/api.py", line 752, in wrapper
    func(layer, path, input_spec, **configs)
  File "/root/opt/py38/lib/python3.8/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 75, in __impl__
    return func(*args, **kwargs)
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/api.py", line 1085, in save
    attr_func.concrete_program_specify_input_spec(
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py", line 709, in concrete_program_specify_input_spec
    concrete_program, _ = self.get_concrete_program(
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py", line 564, in get_concrete_program
    args, kwargs = self._function_spec.unified_args_and_kwargs(
  File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/dy2static/function_spec.py", line 90, in unified_args_and_kwargs
    raise ValueError(error_msg)
ValueError: The decorated function `generate` requires 4 arguments: ['input_ids', 'generation_config', 'stopping_criteria', 'streamer'], but received 26 with (InputSpec(shape=(-1, -1), dtype=paddle.int64, name=None, stop_gradient=False), InputSpec(shape=(-1, -1), dtype=paddle.int64, name=None, stop_gradient=False), None, InputSpec(shape=(1,), dtype=paddle.int64, name=None, stop_gradient=False), 0, 'sampling', InputSpec(shape=(1,), dtype=paddle.float32, name=None, stop_gradient=False), 0, InputSpec(shape=(1,), dtype=paddle.float32, name=None, stop_gradient=False), 1, 1, 1, 0.0, False, 0, 0, 0, None, None, None, None, 1, 0.0, True, False, False).

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

w5688414 commented 7 months ago

现在运行没问题，可以升级到最新版本试一下

github-actions[bot] commented 5 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

github-actions[bot] commented 4 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。

PaddlePaddle / PaddleNLP