pariskang commented 10 months ago

软件环境

paddle2onnx                1.1.0
paddlefsl                  1.1.0
paddlehub                  2.4.0
paddlenlp                  2.6.1.post0
paddlepaddle-gpu           2.5.2

重复问题

[X] I have searched the existing issues

错误描述

在该项目中出现以下问题，麻烦帮看下原因：
https://aistudio.baidu.com/projectdetail/7337156

/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Traceback (most recent call last):
  File "/home/aistudio/PaddleNLP/llm/finetune_generation.py", line 626, in <module>
    main()
  File "/home/aistudio/PaddleNLP/llm/finetune_generation.py", line 64, in main
    gen_args, quant_args, model_args, data_args, training_args = parser.parse_json_file_and_cmd_lines()
AttributeError: 'PdArgumentParser' object has no attribute 'parse_json_file_and_cmd_lines'

稳定复现步骤 & 代码

%cd ~/PaddleNLP/llm/

单卡训练

finetune_generation.zip

!python finetune_generation.py ./llama/lora_argument.json

ZHUI commented 10 months ago

这个的话是最新的脚本修改了代码。需要用dev的paddlenlp。按照如下方式安装。

pip install --pre --upgrade paddlenlp -f https://www.paddlepaddle.org.cn/whl/paddlenlp.html

或者您将您的训练代码使用 release/2.7分支，也是可以的。

pariskang commented 10 months ago

您好，感谢大佬之前的解决方案，但现在显示新的报错，还麻烦帮我看看：

paddle2onnx 1.1.0 paddlefsl 1.1.0 paddlehub 2.4.0 paddlenlp 2.7.1.post0 /home/aistudio/PaddleNLP paddlepaddle-gpu 2.5.2

!python finetune_generation.py ./llama/lora_argument.json

Downloading shards: 100%|█████████████████████████| 5/5 [02:59<00:00, 35.86s/it] W0122 13:36:40.710914 2073 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.0, Runtime API Version: 11.8 W0122 13:36:40.713243 2073 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9. Loading checkpoint shards: 100%|██████████████████| 5/5 [00:22<00:00, 4.55s/it] [2024-01-22 13:37:16,462] [ INFO] - All model checkpoint weights were used when initializing LlamaForCausalLM.

[2024-01-22 13:37:16,463] [ INFO] - All the weights of LlamaForCausalLM were initialized from the model checkpoint at facebook/llama-7b. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [2024-01-22 13:37:16,525] [ INFO] - Generation config file not found, using a generation config created from the model config. [2024-01-22 13:37:16,526] [ INFO] - We are using (<class 'paddlenlp.transformers.llama.tokenizer.LlamaTokenizer'>, False) to load 'facebook/llama-7b'. [2024-01-22 13:37:16,526] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/llama/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/facebook/llama-7b [2024-01-22 13:37:16,569] [ INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/llama/sentencepiece.bpe.model 100%|████████████████████████████████████████| 488k/488k [00:00<00:00, 13.1MB/s] [2024-01-22 13:37:16,719] [ INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/facebook/llama-7b/tokenizer_config.json [2024-01-22 13:37:16,719] [ INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/facebook/llama-7b/special_tokens_map.json Traceback (most recent call last): File "/home/aistudio/PaddleNLP/llm/finetune_generation.py", line 626, in main() File "/home/aistudio/PaddleNLP/llm/finetune_generation.py", line 285, in main train_ds = load_dataset(data_args.dataset_name_or_path, splits=["train"])[0] File "/home/aistudio/PaddleNLP/paddlenlp/datasets/dataset.py", line 196, in load_dataset datasets = load_from_hf( File "/home/aistudio/PaddleNLP/paddlenlp/datasets/dataset.py", line 116, in load_from_hf hf_datasets = load_hf_dataset(path, name=name, split=splits, *kwargs) File "/home/aistudio/PaddleNLP/paddlenlp/datasets/dataset.py", line 56, in load_from_ppnlp return origin_load_dataset(path, args, **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 2129, in load_dataset builder_instance = load_dataset_builder( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1815, in load_dataset_builder dataset_module = dataset_module_factory( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1512, in dataset_module_factory raise e1 from None File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1479, in dataset_module_factory raise e File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/datasets/load.py", line 1453, in dataset_module_factory dataset_info = hf_api.dataset_info( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id raise HFValidationError( huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: './data'.

github-actions[bot] commented 7 months ago

This issue is stale because it has been open for 60 days with no activity. 当前issue 60天内无活动，被标记为stale。

github-actions[bot] commented 6 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale. 当前issue 被标记为stale已有14天，即将关闭。

PaddlePaddle / PaddleNLP

[Bug]: AttributeError: 'PdArgumentParser' object has no attribute 'parse_json_file_and_cmd_lines' #7875

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

单卡训练