PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
12.1k stars 2.93k forks source link

[Bug]: FileNotFoundError: configuration file<config.json> or <model_config.json> not found #4738

Closed gcr1992 closed 1 year ago

gcr1992 commented 1 year ago

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.3.2.post111
- paddlenlp:  2.5.0.post0   2.5.0 都试过

重复问题

错误描述

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/code_generation/codegen

试用PaddleNLP codegen,安装之后,通过代码进行测试,一直提示FileNotFoundError: configuration file<config.json> or <model_config.json> not found,尝试过安装paddlenlp:  2.5.0.post0 2.5.0 

跟踪过源码,code_generation.py 
self._construct_tokenizer(model) 可以正常下载
self._construct_model(model)  下载不了配置文件

'https://bj.bcebos.com/paddlenlp/models/community/Salesforce/codegen-350M-mono/config.json'
'https://bj.bcebos.com/paddlenlp/models/community/Salesforce/codegen-350M-mono/model_config.json'

稳定复现步骤 & 代码

1.安装 (1)从https://github.com/PaddlePaddle/PaddleNLP下载 develop 或者 v2.5.0 到本地window10机器 (2)pip uninstall -y paddlenlp (3)进入 PaddleNLP解压目录 执行 python setup.py install 2.代码测试 from paddlenlp import Taskflow prompt = "def lengthOfLongestSubstring(self, s: str) -> int:" codegen = Taskflow("code_generation", model="Salesforce/codegen-350M-mono",decode_strategy="greedy_search", repetition_penalty=1.0) print(codegen(prompt))

3.结果日志:

D:\Program Files\Python37\lib\site-packages_distutils_hack__init.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-02-10 14:12:35,367] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/vocab.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono [2023-02-10 14:12:35,512] [ INFO] - Downloading vocab.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/vocab.json 100%|██████████| 779k/779k [00:01<00:00, 773kB/s] [2023-02-10 14:12:36,900] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/merges.txt and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono [2023-02-10 14:12:37,069] [ INFO] - Downloading merges.txt from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/merges.txt 100%|██████████| 446k/446k [00:00<00:00, 562kB/s] [2023-02-10 14:12:38,164] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/added_tokens.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono [2023-02-10 14:12:38,305] [ INFO] - Downloading added_tokens.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/added_tokens.json 100%|██████████| 0.98k/0.98k [00:00<?, ?B/s] [2023-02-10 14:12:38,429] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/special_tokens_map.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono [2023-02-10 14:12:38,561] [ INFO] - Downloading special_tokens_map.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/special_tokens_map.json 100%|██████████| 90.0/90.0 [00:00<?, ?B/s] [2023-02-10 14:12:38,708] [ INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/tokenizer_config.json and saved to C:\Users\gcr.paddlenlp\models\Salesforce/codegen-350M-mono [2023-02-10 14:12:38,831] [ INFO] - Downloading tokenizer_config.json from https://bj.bcebos.com/paddlenlp/models/community//Salesforce/codegen-350M-mono/tokenizer_config.json 100%|██████████| 177/177 [00:00<?, ?B/s] [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,008] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary [2023-02-10 14:12:39,024] [ INFO] - Adding to the vocabulary Traceback (most recent call last): File "F:/pythonProject/rpa/PaddleNLP/testCode.py", line 6, in codegen = Taskflow("code_generation", model="Salesforce/codegen-350M-mono",decode_strategy="greedy_search", repetition_penalty=1.0) File "D:\Program Files\Python37\lib\site-packages\paddlenlp\taskflow\taskflow.py", line 591, in init__ model=self.model, task=self.task, priority_path=self.priority_path, from_hf_hub=from_hf_hub, *self.kwargs File "D:\Program Files\Python37\lib\site-packages\paddlenlp\taskflow\code_generation.py", line 59, in init self._construct_model(model) File "D:\Program Files\Python37\lib\site-packages\paddlenlp\taskflow\code_generation.py", line 65, in _construct_model self._model = CodeGenForCausalLM.from_pretrained(model) File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\model_utils.py", line 486, in from_pretrained pretrained_model_name_or_path, from_hf_hub=from_hf_hub, subfolder=subfolder, args, kwargs File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1328, in from_pretrained_v2 kwargs, File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 736, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, kwargs) File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 758, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "D:\Program Files\Python37\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 831, in _get_config_dict raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found") FileNotFoundError: configuration file or not found

gongel commented 1 year ago

你好,我这边没有复现该问题。你这边需要确定下pip的python和运行程序的python是同一个。

gcr1992 commented 1 year ago

你好,我这边没有复现该问题。你这边需要确定下pip的python和运行程序的python是同一个。

确定是同一个版本,我在pycharm中控制台执行了pip list 命令行是可以看到的,工程的依赖包也可以看到。如果是缺包,正常应该会报出来啊

caorushizi commented 1 year ago

你自己改一下吧 https://github.com/PaddlePaddle/PaddleNLP/blob/e73834944d8f8a1d0376664f579fa513db411c8f/paddlenlp/transformers/configuration_utils.py#L823

改成 community_url = f"{COMMUNITY_MODEL_PREFIX}/{pretrained_model_name_or_path}/{CONFIG_NAME}"

https://github.com/PaddlePaddle/PaddleNLP/blob/e73834944d8f8a1d0376664f579fa513db411c8f/paddlenlp/transformers/configuration_utils.py#L827

改成 community_url = f"{COMMUNITY_MODEL_PREFIX}/{pretrained_model_name_or_path}/{LEGACY_CONFIG_NAME}"

https://github.com/PaddlePaddle/PaddleNLP/blob/e73834944d8f8a1d0376664f579fa513db411c8f/paddlenlp/transformers/model_utils.py#L1067

改成 community_model_file_path = f"{COMMUNITY_MODEL_PREFIX}/{pretrained_model_name_or_path}/{cls.resource_files_names['model_state']}"

caorushizi commented 1 year ago

使用了 os.path.join 在 windows上拼接url的时候用的是 '\', 导致发送requests请求的时候 404 了。

sijunhe commented 1 year ago

感谢 @caorushizi 和 @gcr1992 二位的反馈。这里确实我们写的逻辑有问题,导致windows用户出错。已在#4758 中fix, 合入之后应该可以解决问题

gongel commented 1 year ago

好的,该问题之前应该修复过 #3640,感谢大家的反馈!

iouen commented 1 year ago
image

mac 电脑,2.5.1的版本,在使用时同样的问题 from paddlenlp import Taskflow

默认模型为 pai-painter-painting-base-zh

text_to_image = Taskflow("text_to_image")

iouen commented 1 year ago
image

这里按上看的调整了,也报错

JunnYu commented 1 year ago

@iouen 当前这个PR正在升级pretrained config的,https://github.com/PaddlePaddle/PaddleNLP/pull/4992 如果您想要体验文生图的话,建议使用ppdiffusers快速体验https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers

northovo commented 1 year ago

D:\Python>python finetune.py --device cpu --logging_steps 5 --save_steps 25 --eval_steps 25 --seed 42 --model_name_or_path uie-x-base --output_dir ./document/model_best --train_path document/data/train.txt --dev_path /document/data/dev.txt --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --num_train_epochs 5 --learning_rate 1e-5 --label_names 'start_position' 'end_position' --do_train --do_eval --do_export --export_model_dir ./document/model_best --overwrite_output_dir --disable_tqdm True --metric_for_best_model eval_f1 --load_best_model_at_end True --save_total_limit 1 D:\Anacoda\lib\site-packages_distutils_hack__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") [2023-03-12 00:08:19,796] [ WARNING] - evaluation_strategy reset to IntervalStrategy.STEPS for do_eval is True. you can also set evaluation_strategy='epoch'. [2023-03-12 00:08:19,796] [ INFO] - The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [2023-03-12 00:08:19,796] [ INFO] - ============================================================ [2023-03-12 00:08:19,796] [ INFO] - Model Configuration Arguments [2023-03-12 00:08:19,796] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009 [2023-03-12 00:08:19,796] [ INFO] - export_model_dir :./document/model_best [2023-03-12 00:08:19,796] [ INFO] - model_name_or_path :uie-x-base [2023-03-12 00:08:19,796] [ INFO] - multilingual :False [2023-03-12 00:08:19,796] [ INFO] - [2023-03-12 00:08:19,796] [ INFO] - ============================================================ [2023-03-12 00:08:19,796] [ INFO] - Data Configuration Arguments [2023-03-12 00:08:19,796] [ INFO] - paddle commit id :0e92adceae06b6b7463f2dc7790ffb0601730009 [2023-03-12 00:08:19,796] [ INFO] - dev_path :/document/data/dev.txt [2023-03-12 00:08:19,796] [ INFO] - dynamic_max_length :None [2023-03-12 00:08:19,796] [ INFO] - max_seq_length :512 [2023-03-12 00:08:19,796] [ INFO] - train_path :document/data/train.txt [2023-03-12 00:08:19,812] [ INFO] - [2023-03-12 00:08:19,812] [ WARNING] - Process rank: -1, device: cpu, world_size: 1, distributed training: False, 16-bits training: False [2023-03-12 00:08:19,812] [ INFO] - We are using <class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'> to load 'uie-x-base'. [2023-03-12 00:08:19,812] [ INFO] - Already cached C:\Users\24210.paddlenlp\models\uie-x-base\vocab.txt [2023-03-12 00:08:19,812] [ INFO] - Already cached C:\Users\24210.paddlenlp\models\uie-x-base\sentencepiece.bpe.model [2023-03-12 00:08:20,378] [ INFO] - tokenizer config file saved in C:\Users\24210.paddlenlp\models\uie-x-base\tokenizer_config.json [2023-03-12 00:08:20,378] [ INFO] - Special tokens file saved in C:\Users\24210.paddlenlp\models\uie-x-base\special_tokens_map.json Traceback (most recent call last): File "D:\Python\finetune.py", line 244, in main() File "D:\Python\finetune.py", line 134, in main model = UIE.from_pretrained(model_args.model_name_or_path) File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\model_utils.py", line 484, in from_pretrained return cls.from_pretrained_v2( File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\model_utils.py", line 1320, in from_pretrained_v2 config, model_kwargs = cls.config_class.from_pretrained( File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 699, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 722, in get_config_dict config_dict, kwargs = cls._get_config_dict( File "D:\Anacoda\lib\site-packages\paddlenlp\transformers\configuration_utils.py", line 797, in _get_config_dict raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found") FileNotFoundError: configuration file or not found

lili-changjiang commented 1 year ago

我也有同上的问题

byy-git commented 1 year ago

I got the same error: raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found") FileNotFoundError: configuration file or not found

byy-git commented 1 year ago

I got the same error: raise FileNotFoundError(f"configuration file<{CONFIG_NAME}> or <{LEGACY_CONFIG_NAME}> not found") FileNotFoundError: configuration file or not found

I found that the error was caused by the use of the wrong finetune.py, it should be ./document/finetune.py, not ./text/finetune.py

zhaoqf-cq commented 1 year ago

没人发现这个config地址是 https://bj.bcebos.com/paddlenlp/models/community/uie-x-base/config.json https://bj.bcebos.com/paddlenlp/models/community/uie-x-base/model_config.json 这个改本地代码能有用? 拿不到配置啊

chenzaichun commented 11 months ago

同样问题,拿不到对应的配置文件。