Closed Rane2021 closed 4 months ago
~/opt/py38/bin/python export_model.py --model_name_or_path FlagAlpha/Llama2-Chinese-7b-Chat --output_path ./inference --dtype float16
/root/opt/py38/lib/python3.8/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
[2023-12-13 16:27:41,659] [ INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/tokenizer_config.json
[2023-12-13 16:27:41,660] [ INFO] - We are using <class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'> to load 'FlagAlpha/Llama2-Chinese-7b-Chat'.
[2023-12-13 16:27:41,660] [ INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/sentencepiece.bpe.model
[2023-12-13 16:27:41,660] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community/FlagAlpha/Llama2-Chinese-7b-Chat/added_tokens.json and saved to /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat
[2023-12-13 16:27:41,819] [ WARNING] - file<https://bj.bcebos.com/paddlenlp/models/community/FlagAlpha/Llama2-Chinese-7b-Chat/added_tokens.json> not exist
[2023-12-13 16:27:41,821] [ INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/special_tokens_map.json
[2023-12-13 16:27:41,821] [ INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/tokenizer_config.json
[2023-12-13 16:27:41,862] [ ERROR] - Using pad_token, but it is not set yet.
[2023-12-13 16:27:41,999] [ INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:27:42,002] [ INFO] - We are using <class 'paddlenlp.transformers.llama.modeling.LlamaForCausalLM'> to load 'FlagAlpha/Llama2-Chinese-7b-Chat'.
[2023-12-13 16:27:42,142] [ INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:27:42,143] [ INFO] - Loading configuration file /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:27:42,446] [ INFO] - Already cached /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/model_state.pdparams
[2023-12-13 16:27:42,447] [ INFO] - Loading weights file model_state.pdparams from cache at /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/model_state.pdparams
[2023-12-13 16:29:37,609] [ INFO] - Loaded weights file from disk, setting weights to model.
W1213 16:29:37.621098 125450 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.4, Runtime API Version: 11.2
W1213 16:29:37.625649 125450 gpu_resources.cc:149] device: 0, cuDNN Version: 8.1.
[2023-12-13 16:30:17,304] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.
[2023-12-13 16:30:17,305] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at FlagAlpha/Llama2-Chinese-7b-Chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[2023-12-13 16:30:17,449] [ INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/generation_config.json
[2023-12-13 16:30:17,452] [ INFO] - Loading configuration file /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/generation_config.json
/root/opt/py38/lib/python3.8/site-packages/paddlenlp/generation/configuration_utils.py:247: UserWarning: using greedy search strategy. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/root/opt/py38/lib/python3.8/site-packages/paddlenlp/generation/configuration_utils.py:252: UserWarning: using greedy search strategy. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `decode_strategy="greedy_search" ` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
[2023-12-13 16:30:18,230] [ INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:30:18,231] [ INFO] - We are using <class 'paddlenlp.transformers.llama.configuration.LlamaConfig'> to load 'FlagAlpha/Llama2-Chinese-7b-Chat'.
[2023-12-13 16:30:18,376] [ INFO] - Found /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
[2023-12-13 16:30:18,378] [ INFO] - Loading configuration file /root/.paddlenlp/models/FlagAlpha/Llama2-Chinese-7b-Chat/config.json
/root/opt/py38/lib/python3.8/site-packages/paddle/jit/api.py:944: UserWarning: What you save is a function, and `jit.save` will generate the name of the model file according to `path` you specify. When loading these files with `jit.load`, you get a `TranslatedLayer` whose inference result is the same as the inference result of the function you saved.
warnings.warn(
Traceback (most recent call last):
File "export_model.py", line 96, in <module>
main()
File "export_model.py", line 84, in main
predictor.model.to_static(
File "/root/opt/py38/lib/python3.8/site-packages/paddlenlp/generation/utils.py", line 1326, in to_static
paddle.jit.save(model, path)
File "/root/opt/py38/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/root/opt/py38/lib/python3.8/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/api.py", line 752, in wrapper
func(layer, path, input_spec, **configs)
File "/root/opt/py38/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/root/opt/py38/lib/python3.8/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "/root/opt/py38/lib/python3.8/site-packages/paddle/fluid/dygraph/base.py", line 75, in __impl__
return func(*args, **kwargs)
File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/api.py", line 1085, in save
attr_func.concrete_program_specify_input_spec(
File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py", line 709, in concrete_program_specify_input_spec
concrete_program, _ = self.get_concrete_program(
File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py", line 564, in get_concrete_program
args, kwargs = self._function_spec.unified_args_and_kwargs(
File "/root/opt/py38/lib/python3.8/site-packages/paddle/jit/dy2static/function_spec.py", line 90, in unified_args_and_kwargs
raise ValueError(error_msg)
ValueError: The decorated function `generate` requires 4 arguments: ['input_ids', 'generation_config', 'stopping_criteria', 'streamer'], but received 26 with (InputSpec(shape=(-1, -1), dtype=paddle.int64, name=None, stop_gradient=False), InputSpec(shape=(-1, -1), dtype=paddle.int64, name=None, stop_gradient=False), None, InputSpec(shape=(1,), dtype=paddle.int64, name=None, stop_gradient=False), 0, 'sampling', InputSpec(shape=(1,), dtype=paddle.float32, name=None, stop_gradient=False), 0, InputSpec(shape=(1,), dtype=paddle.float32, name=None, stop_gradient=False), 1, 1, 1, 0.0, False, 0, 0, 0, None, None, None, None, 1, 0.0, True, False, False).
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
现在运行没问题,可以升级到最新版本试一下
This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动,被标记为stale。
This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天,即将关闭。
软件环境
重复问题
错误描述
稳定复现步骤 & 代码
python export_model.py --model_name_or_path meta-llama/Llama-2-7b --output_path ./llama2-7b-static --dtype float16 --inference_model