PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.37k stars 7.83k forks source link

kie训练自定义数据集,配置文件指定预训练模型不生效 #13627

Open freezehe opened 3 months ago

freezehe commented 3 months ago

Search before asking

Bug

如题:我是在百度studio进行训练,参考官方文档进行操作https://github.com/PaddlePaddle/PaddleOCR/blob/main/doc/doc_ch/kie.md, 我首先训练的是ser模型,配置内容如下:

Global:
  use_gpu: True
  epoch_num: &epoch_num 20
  log_smooth_window: 10
  print_batch_step: 10
  save_model_dir: ./output/ccic/ser_vi_layoutxlm_xfund_zh
  save_epoch_step: 2000
  # evaluation is run every 10 iterations after the 0th iteration
  eval_batch_step: [ 0, 19 ]
  cal_metric_during_train: False
  **pretrained_model: ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained**
  save_inference_dir:
  use_visualdl: False
  seed: 2022
  infer_img: ppstructure/docs/kie/input/zh_val_42.jpg
  d2s_train_image_shape: [3, 224, 224]
  # if you want to predict using the groundtruth ocr info,
  # you can use the following config
  # infer_img: train_data/XFUND/zh_val/val.json
  # infer_mode: False

  save_res_path: ./output/ccic/ser/xfund_zh/res
  kie_rec_model_dir: 
  kie_det_model_dir:
  amp_custom_white_list: ['scale', 'concat', 'elementwise_add']

Architecture:
  model_type: kie
  algorithm: &algorithm "LayoutXLM"
  Transform:
  Backbone:
    name: LayoutXLMForSer
    pretrained: True
    checkpoints:
    # one of base or vi
    mode: vi
    num_classes: &num_classes 7

Loss:
  name: VQASerTokenLayoutLMLoss
  num_classes: *num_classes
  key: "backbone_out"

Optimizer:
  name: AdamW
  beta1: 0.9
  beta2: 0.999
  lr:
    name: Linear
    learning_rate: 0.00001
    epochs: *epoch_num
    warmup_epoch: 2
  regularizer:
    name: L2
    factor: 0.00000

PostProcess:
  name: VQASerTokenLayoutLMPostProcess
  class_path: &class_path train_data/XCCIC_8020/class_list_xfun.txt

Metric:
  name: VQASerTokenMetric
  main_indicator: hmean

Train:
  dataset:
    name: SimpleDataSet
    data_dir: train_data/XCCIC_8020/zh_train/image
    label_file_list: 
      - train_data/XCCIC_8020/zh_train/train.json
    ratio_list: [ 1.0 ]
    transforms:
      - DecodeImage: # load image
          img_mode: RGB
          channel_first: False
      - VQATokenLabelEncode: # Class handling label
          contains_re: False
          algorithm: *algorithm
          class_path: *class_path
          use_textline_bbox_info: &use_textline_bbox_info True
          # one of [None, "tb-yx"]
          order_method: &order_method "tb-yx"
      - VQATokenPad:
          max_seq_len: &max_seq_len 512
          return_attention_mask: True
      - VQASerTokenChunk:
          max_seq_len: *max_seq_len
      - Resize:
          size: [224,224]
      - NormalizeImage:
          scale: 1
          mean: [ 123.675, 116.28, 103.53 ]
          std: [ 58.395, 57.12, 57.375 ]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
  loader:
    shuffle: True
    drop_last: False
    batch_size_per_card: 8
    num_workers: 4

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: train_data/XCCIC_8020/zh_val/image
    label_file_list:
      - train_data/XCCIC_8020/zh_val/val.json
    transforms:
      - DecodeImage: # load image
          img_mode: RGB
          channel_first: False
      - VQATokenLabelEncode: # Class handling label
          contains_re: False
          algorithm: *algorithm
          class_path: *class_path
          use_textline_bbox_info: *use_textline_bbox_info
          order_method: *order_method
      - VQATokenPad:
          max_seq_len: *max_seq_len
          return_attention_mask: True
      - VQASerTokenChunk:
          max_seq_len: *max_seq_len
      - Resize:
          size: [224,224]
      - NormalizeImage:
          scale: 1
          mean: [ 123.675, 116.28, 103.53 ]
          std: [ 58.395, 57.12, 57.375 ]
          order: 'hwc'
      - ToCHWImage:
      - KeepKeys:
          keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 8
    num_workers: 4

pretrained_model: ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained 这一行配置是我新加的。当我执行训练命令:

%cd /home/aistudio/PaddleOCR
!python3 tools/train.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml

可以看到日志还是会默认下载模型并没有使用我配置的预训练模型, image 我的需求是:我希望使用官网文档提供的预训练模型进行自定义数据的训练。

/home/aistudio/PaddleOCR
[2024/08/09 10:41:58] ppocr INFO: Architecture : 
[2024/08/09 10:41:58] ppocr INFO:     Backbone : 
[2024/08/09 10:41:58] ppocr INFO:         checkpoints : None
[2024/08/09 10:41:58] ppocr INFO:         mode : vi
[2024/08/09 10:41:58] ppocr INFO:         name : LayoutXLMForSer
[2024/08/09 10:41:58] ppocr INFO:         num_classes : 7
[2024/08/09 10:41:58] ppocr INFO:         pretrained : True
[2024/08/09 10:41:58] ppocr INFO:     Transform : None
[2024/08/09 10:41:58] ppocr INFO:     algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:     model_type : kie
[2024/08/09 10:41:58] ppocr INFO: Eval : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : False
[2024/08/09 10:41:58] ppocr INFO: Global : 
[2024/08/09 10:41:58] ppocr INFO:     amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/09 10:41:58] ppocr INFO:     cal_metric_during_train : False
[2024/08/09 10:41:58] ppocr INFO:     d2s_train_image_shape : [3, 224, 224]
[2024/08/09 10:41:58] ppocr INFO:     distributed : False
[2024/08/09 10:41:58] ppocr INFO:     epoch_num : 20
[2024/08/09 10:41:58] ppocr INFO:     eval_batch_step : [0, 19]
[2024/08/09 10:41:58] ppocr INFO:     infer_img : ppstructure/docs/kie/input/zh_val_42.jpg
[2024/08/09 10:41:58] ppocr INFO:     kie_det_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     kie_rec_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     log_smooth_window : 10
[2024/08/09 10:41:58] ppocr INFO:     pretrained_model : ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained
[2024/08/09 10:41:58] ppocr INFO:     print_batch_step : 10
[2024/08/09 10:41:58] ppocr INFO:     save_epoch_step : 2000
[2024/08/09 10:41:58] ppocr INFO:     save_inference_dir : None
[2024/08/09 10:41:58] ppocr INFO:     save_model_dir : ./output/ccic/ser_vi_layoutxlm_xfund_zh
[2024/08/09 10:41:58] ppocr INFO:     save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/09 10:41:58] ppocr INFO:     seed : 2022
[2024/08/09 10:41:58] ppocr INFO:     use_gpu : True
[2024/08/09 10:41:58] ppocr INFO:     use_visualdl : False
[2024/08/09 10:41:58] ppocr INFO: Loss : 
[2024/08/09 10:41:58] ppocr INFO:     key : backbone_out
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMLoss
[2024/08/09 10:41:58] ppocr INFO:     num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: Metric : 
[2024/08/09 10:41:58] ppocr INFO:     main_indicator : hmean
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenMetric
[2024/08/09 10:41:58] ppocr INFO: Optimizer : 
[2024/08/09 10:41:58] ppocr INFO:     beta1 : 0.9
[2024/08/09 10:41:58] ppocr INFO:     beta2 : 0.999
[2024/08/09 10:41:58] ppocr INFO:     lr : 
[2024/08/09 10:41:58] ppocr INFO:         epochs : 20
[2024/08/09 10:41:58] ppocr INFO:         learning_rate : 1e-05
[2024/08/09 10:41:58] ppocr INFO:         name : Linear
[2024/08/09 10:41:58] ppocr INFO:         warmup_epoch : 2
[2024/08/09 10:41:58] ppocr INFO:     name : AdamW
[2024/08/09 10:41:58] ppocr INFO:     regularizer : 
[2024/08/09 10:41:58] ppocr INFO:         factor : 0.0
[2024/08/09 10:41:58] ppocr INFO:         name : L2
[2024/08/09 10:41:58] ppocr INFO: PostProcess : 
[2024/08/09 10:41:58] ppocr INFO:     class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMPostProcess
[2024/08/09 10:41:58] ppocr INFO: Train : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         ratio_list : [1.0]
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : True
[2024/08/09 10:41:58] ppocr INFO: profiler_options : None
[2024/08/09 10:41:58] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
[2024/08/09 10:41:58] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_train/train.json']
list index out of range
[2024-08-09 10:41:59,583] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2024-08-09 10:41:59,640] [    INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 5.25MB/s]
[2024-08-09 10:42:01,488] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:01,488] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/09 10:42:01] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_val/val.json']
[2024-08-09 10:42:01,490] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-09 10:42:02,249] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:02,249] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-08-09 10:42:02,252] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/vi-layoutxlm-base-uncased
[2024-08-09 10:42:02,252] [    INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams
100%|██████████████████████████████████████| 1.04G/1.04G [00:13<00:00, 80.3MB/s]
W0809 10:42:16.289948 80856 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0809 10:42:16.291229 80856 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024-08-09 10:42:19,987] [    INFO] - Weights of LayoutXLMForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
[2024/08/09 10:42:20] ppocr INFO: train dataloader has 18 iters
[2024/08/09 10:42:20] ppocr INFO: valid dataloader has 5 iters

Environment

百度studio aiofiles==23.2.1 aiohttp==3.9.5 aiosignal==1.3.1 aistudio-sdk @ file:///home/aistudio/aistudio_sdk-0.2.4-py3-none-any.whl#sha256=d93411cc8764e465860cbf2f97f787dddd1548595d4776c97ddf0ea787dedd81 albucore==0.0.13 albumentations==1.4.10 altair==4.2.2 annotated-types==0.6.0 anyio==4.3.0 astor==0.8.1 asttokens==2.4.1 async-timeout==4.0.3 attrdict3==2.0.2 attrs==23.2.0 Babel==2.14.0 bce-python-sdk==0.9.6 beautifulsoup4==4.12.3 blinker==1.7.0 cachetools==5.3.3 certifi==2024.2.2 charset-normalizer==3.3.2 click==8.1.7 colorama==0.4.6 coloredlogs==15.0.1 colorlog==6.8.2 comm==0.2.2 contourpy==1.2.1 cycler==0.12.1 Cython==3.0.11 datasets==2.19.0 debugpy==1.8.1 decorator==5.1.1 dill==0.3.4 easydict==1.13 entrypoints==0.4 exceptiongroup==1.2.1 executing==2.0.1 fastapi==0.110.2 ffmpy==0.3.2 filelock==3.13.4 fire==0.6.0 Flask==3.0.3 Flask-Babel==2.0.0 flatbuffers==24.3.25 fonttools==4.51.0 frozenlist==1.4.1 fsspec==2024.3.1 future==1.0.0 gitdb==4.0.11 GitPython==3.1.43 gradio==3.40.0 gradio_client==0.15.1 gunicorn==22.0.0 h11==0.14.0 httpcore==1.0.5 httpx==0.27.0 huggingface-hub==0.22.2 humanfriendly==10.0 idna==3.7 imageio==2.34.2 imgaug==0.4.0 importlib_metadata==7.1.0 importlib_resources==6.4.0 ipykernel==6.29.4 ipython==8.23.0 itsdangerous==2.2.0 jedi==0.19.1 jieba==0.42.1 Jinja2==3.1.3 joblib==1.4.0 jsonschema==4.21.1 jsonschema-specifications==2023.12.1 jupyter_client==8.6.1 jupyter_core==5.7.2 kiwisolver==1.4.5 lazy_loader==0.4 linkify-it-py==2.0.3 lmdb==1.5.1 lxml==5.2.2 markdown-it-py==2.2.0 MarkupSafe==2.1.5 matplotlib==3.8.4 matplotlib-inline==0.1.7 mdit-py-plugins==0.3.3 mdurl==0.1.1 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.12.2 nest-asyncio==1.6.0 networkx==3.3 numpy==1.26.4 onnx==1.16.0 onnxruntime==1.17.3 opencv-contrib-python==4.10.0.84 opencv-python==4.9.0.80 opencv-python-headless==4.10.0.84 opt-einsum==3.3.0 orjson==3.10.1 packaging==24.0 paddle2onnx==1.2.1 paddlefsl==1.1.0 paddlehub==2.4.0 paddlenlp==2.5.2 paddleocr==2.8.1 paddlepaddle-gpu @ file:///tmp/paddlepaddle_gpu-2.5.2-cp310-cp310-linux_x86_64.whl#sha256=2b4a84c853c7c88ddf4984c667bfcb824cc8a28a674448099452f50c686cc1bb pandas==2.2.2 parso==0.8.4 pexpect==4.9.0 pickleshare==0.7.5 pillow==10.3.0 platformdirs==4.2.0 prettytable==3.10.0 prompt-toolkit==3.0.43 protobuf==3.20.3 psutil==5.9.8 ptyprocess==0.7.0 pure-eval==0.2.2 pyarrow==16.0.0 pyarrow-hotfix==0.6 pybind11==2.12.0 pyclipper==1.3.0.post5 pycryptodome==3.20.0 pydantic==2.7.0 pydantic_core==2.18.1 pydeck==0.9.1 pydub==0.25.1 Pygments==2.17.2 Pympler==1.0.1 pypandoc==1.13 pyparsing==3.1.2 python-dateutil==2.9.0.post0 python-docx==1.1.2 python-multipart==0.0.9 pytz==2024.1 PyYAML==6.0.1 pyzmq==26.0.2 rapidfuzz==3.9.6 rarfile==4.2 referencing==0.34.0 requests==2.31.0 rich==13.7.1 rpds-py==0.18.0 ruff==0.4.1 safetensors==0.4.3 scikit-image==0.24.0 scikit-learn==1.4.2 scipy==1.13.0 semantic-version==2.10.0 semver==3.0.2 sentencepiece==0.2.0 seqeval==1.2.2 shapely==2.0.5 shellingham==1.5.4 six==1.16.0 smmap==5.0.1 sniffio==1.3.1 soupsieve==2.5 stack-data==0.6.3 starlette==0.37.2 streamlit==1.13.0 streamlit-image-comparison==0.0.4 sympy==1.12 termcolor==2.4.0 threadpoolctl==3.4.0 tifffile==2024.7.24 toml==0.10.2 tomli==2.0.1 tomlkit==0.12.0 tool-helpers==0.1.1 toolz==0.12.1 tornado==6.4 tqdm==4.66.2 traitlets==5.14.3 typer==0.12.3 typing_extensions==4.11.0 tzdata==2024.1 tzlocal==5.2 uc-micro-py==1.0.3 urllib3==2.2.1 uvicorn==0.29.0 validators==0.28.3 visualdl==2.4.2 watchdog==4.0.1 wcwidth==0.2.13 websockets==11.0.3 Werkzeug==3.0.2 xxhash==3.4.1 yacs==0.1.8 yarl==1.9.4 zipp==3.19.2

Minimal Reproducible Example

/home/aistudio/PaddleOCR
[2024/08/09 10:41:58] ppocr INFO: Architecture : 
[2024/08/09 10:41:58] ppocr INFO:     Backbone : 
[2024/08/09 10:41:58] ppocr INFO:         checkpoints : None
[2024/08/09 10:41:58] ppocr INFO:         mode : vi
[2024/08/09 10:41:58] ppocr INFO:         name : LayoutXLMForSer
[2024/08/09 10:41:58] ppocr INFO:         num_classes : 7
[2024/08/09 10:41:58] ppocr INFO:         pretrained : True
[2024/08/09 10:41:58] ppocr INFO:     Transform : None
[2024/08/09 10:41:58] ppocr INFO:     algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:     model_type : kie
[2024/08/09 10:41:58] ppocr INFO: Eval : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_val/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_val/val.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : False
[2024/08/09 10:41:58] ppocr INFO: Global : 
[2024/08/09 10:41:58] ppocr INFO:     amp_custom_white_list : ['scale', 'concat', 'elementwise_add']
[2024/08/09 10:41:58] ppocr INFO:     cal_metric_during_train : False
[2024/08/09 10:41:58] ppocr INFO:     d2s_train_image_shape : [3, 224, 224]
[2024/08/09 10:41:58] ppocr INFO:     distributed : False
[2024/08/09 10:41:58] ppocr INFO:     epoch_num : 20
[2024/08/09 10:41:58] ppocr INFO:     eval_batch_step : [0, 19]
[2024/08/09 10:41:58] ppocr INFO:     infer_img : ppstructure/docs/kie/input/zh_val_42.jpg
[2024/08/09 10:41:58] ppocr INFO:     kie_det_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     kie_rec_model_dir : None
[2024/08/09 10:41:58] ppocr INFO:     log_smooth_window : 10
[2024/08/09 10:41:58] ppocr INFO:     pretrained_model : ./pretrained_model/ser_vi_layoutxlm_xfund_pretrained
[2024/08/09 10:41:58] ppocr INFO:     print_batch_step : 10
[2024/08/09 10:41:58] ppocr INFO:     save_epoch_step : 2000
[2024/08/09 10:41:58] ppocr INFO:     save_inference_dir : None
[2024/08/09 10:41:58] ppocr INFO:     save_model_dir : ./output/ccic/ser_vi_layoutxlm_xfund_zh
[2024/08/09 10:41:58] ppocr INFO:     save_res_path : ./output/ccic/ser/xfund_zh/res
[2024/08/09 10:41:58] ppocr INFO:     seed : 2022
[2024/08/09 10:41:58] ppocr INFO:     use_gpu : True
[2024/08/09 10:41:58] ppocr INFO:     use_visualdl : False
[2024/08/09 10:41:58] ppocr INFO: Loss : 
[2024/08/09 10:41:58] ppocr INFO:     key : backbone_out
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMLoss
[2024/08/09 10:41:58] ppocr INFO:     num_classes : 7
[2024/08/09 10:41:58] ppocr INFO: Metric : 
[2024/08/09 10:41:58] ppocr INFO:     main_indicator : hmean
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenMetric
[2024/08/09 10:41:58] ppocr INFO: Optimizer : 
[2024/08/09 10:41:58] ppocr INFO:     beta1 : 0.9
[2024/08/09 10:41:58] ppocr INFO:     beta2 : 0.999
[2024/08/09 10:41:58] ppocr INFO:     lr : 
[2024/08/09 10:41:58] ppocr INFO:         epochs : 20
[2024/08/09 10:41:58] ppocr INFO:         learning_rate : 1e-05
[2024/08/09 10:41:58] ppocr INFO:         name : Linear
[2024/08/09 10:41:58] ppocr INFO:         warmup_epoch : 2
[2024/08/09 10:41:58] ppocr INFO:     name : AdamW
[2024/08/09 10:41:58] ppocr INFO:     regularizer : 
[2024/08/09 10:41:58] ppocr INFO:         factor : 0.0
[2024/08/09 10:41:58] ppocr INFO:         name : L2
[2024/08/09 10:41:58] ppocr INFO: PostProcess : 
[2024/08/09 10:41:58] ppocr INFO:     class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:     name : VQASerTokenLayoutLMPostProcess
[2024/08/09 10:41:58] ppocr INFO: Train : 
[2024/08/09 10:41:58] ppocr INFO:     dataset : 
[2024/08/09 10:41:58] ppocr INFO:         data_dir : train_data/XCCIC_8020/zh_train/image
[2024/08/09 10:41:58] ppocr INFO:         label_file_list : ['train_data/XCCIC_8020/zh_train/train.json']
[2024/08/09 10:41:58] ppocr INFO:         name : SimpleDataSet
[2024/08/09 10:41:58] ppocr INFO:         ratio_list : [1.0]
[2024/08/09 10:41:58] ppocr INFO:         transforms : 
[2024/08/09 10:41:58] ppocr INFO:             DecodeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 channel_first : False
[2024/08/09 10:41:58] ppocr INFO:                 img_mode : RGB
[2024/08/09 10:41:58] ppocr INFO:             VQATokenLabelEncode : 
[2024/08/09 10:41:58] ppocr INFO:                 algorithm : LayoutXLM
[2024/08/09 10:41:58] ppocr INFO:                 class_path : train_data/XCCIC_8020/class_list_xfun.txt
[2024/08/09 10:41:58] ppocr INFO:                 contains_re : False
[2024/08/09 10:41:58] ppocr INFO:                 order_method : tb-yx
[2024/08/09 10:41:58] ppocr INFO:                 use_textline_bbox_info : True
[2024/08/09 10:41:58] ppocr INFO:             VQATokenPad : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:                 return_attention_mask : True
[2024/08/09 10:41:58] ppocr INFO:             VQASerTokenChunk : 
[2024/08/09 10:41:58] ppocr INFO:                 max_seq_len : 512
[2024/08/09 10:41:58] ppocr INFO:             Resize : 
[2024/08/09 10:41:58] ppocr INFO:                 size : [224, 224]
[2024/08/09 10:41:58] ppocr INFO:             NormalizeImage : 
[2024/08/09 10:41:58] ppocr INFO:                 mean : [123.675, 116.28, 103.53]
[2024/08/09 10:41:58] ppocr INFO:                 order : hwc
[2024/08/09 10:41:58] ppocr INFO:                 scale : 1
[2024/08/09 10:41:58] ppocr INFO:                 std : [58.395, 57.12, 57.375]
[2024/08/09 10:41:58] ppocr INFO:             ToCHWImage : None
[2024/08/09 10:41:58] ppocr INFO:             KeepKeys : 
[2024/08/09 10:41:58] ppocr INFO:                 keep_keys : ['input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
[2024/08/09 10:41:58] ppocr INFO:     loader : 
[2024/08/09 10:41:58] ppocr INFO:         batch_size_per_card : 8
[2024/08/09 10:41:58] ppocr INFO:         drop_last : False
[2024/08/09 10:41:58] ppocr INFO:         num_workers : 4
[2024/08/09 10:41:58] ppocr INFO:         shuffle : True
[2024/08/09 10:41:58] ppocr INFO: profiler_options : None
[2024/08/09 10:41:58] ppocr INFO: train with paddle 2.5.2 and device Place(gpu:0)
[2024/08/09 10:41:58] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_train/train.json']
list index out of range
[2024-08-09 10:41:59,583] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model and saved to /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased
[2024-08-09 10:41:59,640] [    INFO] - Downloading sentencepiece.bpe.model from https://bj.bcebos.com/paddlenlp/models/transformers/layoutxlm_base/sentencepiece.bpe.model
100%|██████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 5.25MB/s]
[2024-08-09 10:42:01,488] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:01,488] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024/08/09 10:42:01] ppocr INFO: Initialize indexs of datasets:['train_data/XCCIC_8020/zh_val/val.json']
[2024-08-09 10:42:01,490] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/sentencepiece.bpe.model
[2024-08-09 10:42:02,249] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/tokenizer_config.json
[2024-08-09 10:42:02,249] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/layoutxlm-base-uncased/special_tokens_map.json
[2024-08-09 10:42:02,252] [    INFO] - Downloading https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams and saved to /home/aistudio/.paddlenlp/models/vi-layoutxlm-base-uncased
[2024-08-09 10:42:02,252] [    INFO] - Downloading model_state.pdparams from https://bj.bcebos.com/paddlenlp/models/transformers/vi-layoutxlm-base-uncased/model_state.pdparams
100%|██████████████████████████████████████| 1.04G/1.04G [00:13<00:00, 80.3MB/s]
W0809 10:42:16.289948 80856 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 11.8
W0809 10:42:16.291229 80856 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
[2024-08-09 10:42:19,987] [    INFO] - Weights of LayoutXLMForTokenClassification not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
[2024/08/09 10:42:20] ppocr INFO: train dataloader has 18 iters
[2024/08/09 10:42:20] ppocr INFO: valid dataloader has 5 iters

Additional

No response

Are you willing to submit a PR?

kingleft commented 2 months ago

和你一样的情况,有解决方案了么?

zyk0901 commented 2 months ago

怎么解决呢?

metoogo commented 3 weeks ago

你的pretrained model下载好以后是怎么样的?我下载的tar文件解压后,还是一个没有后缀的压缩文件,改了后缀后再解压得到3个文件,但是缺.pdopt文件,所以还是不能当预训练模型用