Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
2024-03-26 17:25:38,727] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-03-26 17:25:39,159] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-03-26 17:25:39,159] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/03/26 17:25:39] ppocr INFO: Initialize indexs of datasets:['train_data/zzsfp/val.json']
[2024-03-26 17:25:39,161] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-03-26 17:25:39,606] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-03-26 17:25:39,606] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-03-26 17:25:39,608] [ WARNING] - You are using a model of type layoutlmv2 to instantiate a model of type layoutxlm. This is not supported for all configurations of models and can yield errors.
[2024-03-26 17:25:39,618] [ INFO] - HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f21e016cdc0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
接下来都是关于网络的
Traceback (most recent call last):
File "/workdir/model/PaddleOCR-release-2.6/tools/train.py", line 208, in <module>
main(config, device, logger, vdl_writer)
File "/workdir/model/PaddleOCR-release-2.6/tools/train.py", line 121, in main
model = build_model(config['Architecture'])
File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/architectures/init.py", line 34, in build_model
arch = getattr(mod, name)(config)
File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/architectures/distillation_model.py", line 47, in init
model = BaseModel(model_config)
File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/architectures/base_model.py", line 55, in init
self.backbone = build_backbone(config["Backbone"], model_type)
File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/backbones/init.py", line 74, in build_backbone
module_class = eval(module_name)(**config)
File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/backbones/vqa_layoutlm.py", line 149, in init
super(LayoutXLMForSer, self).init(
File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/backbones/vqa_layoutlm.py", line 60, in init
base_model = base_model_class.from_pretrained(
File "/usr/local/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 2125, in from_pretrained
resolved_archive_file, sharded_metadata, is_sharded = cls._resolve_model_file_path(
File "/usr/local/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1633, in _resolve_model_file_path
raise EnvironmentError(
OSError: Can't load the model for 'vi-layoutxlm-base-uncased'. If you were trying to load it from 'https://paddlenlp.bj.bcebos.com'
请尽量不要包含图片在问题中/Please try to not include the image in the issue.
请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem
2024-03-26 17:25:38,727] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model [2024-03-26 17:25:39,159] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json [2024-03-26 17:25:39,159] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json [2024/03/26 17:25:39] ppocr INFO: Initialize indexs of datasets:['train_data/zzsfp/val.json'] [2024-03-26 17:25:39,161] [ INFO] - Already cached /root/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model [2024-03-26 17:25:39,606] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json [2024-03-26 17:25:39,606] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json [2024-03-26 17:25:39,608] [ WARNING] - You are using a model of type layoutlmv2 to instantiate a model of type layoutxlm. This is not supported for all configurations of models and can yield errors. [2024-03-26 17:25:39,618] [ INFO] - HTTPSConnectionPool(host='bj.bcebos.com', port=443): Max retries exceeded with url: /paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f21e016cdc0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
接下来都是关于网络的
Traceback (most recent call last): File "/workdir/model/PaddleOCR-release-2.6/tools/train.py", line 208, in <module> main(config, device, logger, vdl_writer) File "/workdir/model/PaddleOCR-release-2.6/tools/train.py", line 121, in main model = build_model(config['Architecture']) File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/architectures/init.py", line 34, in build_model arch = getattr(mod, name)(config) File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/architectures/distillation_model.py", line 47, in init model = BaseModel(model_config) File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/architectures/base_model.py", line 55, in init self.backbone = build_backbone(config["Backbone"], model_type) File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/backbones/init.py", line 74, in build_backbone module_class = eval(module_name)(**config) File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/backbones/vqa_layoutlm.py", line 149, in init super(LayoutXLMForSer, self).init( File "/workdir/model/PaddleOCR-release-2.6/ppocr/modeling/backbones/vqa_layoutlm.py", line 60, in init base_model = base_model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 2125, in from_pretrained resolved_archive_file, sharded_metadata, is_sharded = cls._resolve_model_file_path( File "/usr/local/lib/python3.10/site-packages/paddlenlp/transformers/model_utils.py", line 1633, in _resolve_model_file_path raise EnvironmentError( OSError: Can't load the model for 'vi-layoutxlm-base-uncased'. If you were trying to load it from 'https://paddlenlp.bj.bcebos.com'
请尽量不要包含图片在问题中/Please try to not include the image in the issue.
服务器在一个内网环境没办法访问外网,我把bpe按照报错提示的model地址扔进去了是好使的,同理扔进去了pdparams但是程序一直报错要去网上找这个模型。因为运行的是ser_vi_layoutxlm_xfund_zh_udml.yml这个文件的config,实在不知道在哪里可以修改地址。 尝试-o Global.pretained_model=root/.paddlenlp/models/layoutxlm-base-uncased/,没有效果。