PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
40.32k stars 7.46k forks source link

Finetune model performs more poorly as accuracy increases #11557

Closed connorourke closed 1 month ago

connorourke commented 5 months ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

I am trying to finetune the arabic_PP-OCRv4_rec_train model. I have a training set consisting of ~14000 words.

I am using the following config.yml file:

Global:
  debug: false
  use_gpu: true
  epoch_num: 200
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: ./output/rec/arabic_rec_finetune_ppocrv4
  save_epoch_step: 10
  eval_batch_step: [0, 2000]
  cal_metric_during_train: true
  pretrained_model: /home/connorourke/.paddleocr/whl/rec/arabic/arabic_PP-OCRv4_rec_train/best_accuracy
  checkpoints: null 
  save_inference_dir:
  use_visualdl: false
  infer_img: doc/imgs_words/arabic/ar_2.jpg
  character_dict_path: ./ppocr/utils/dict/arabic_dict.txt
  max_text_length: 25
  infer_mode: false
  use_space_char: true
  distributed: true
  save_res_path: ./output/rec/arabic_rec_finetune_ppocrv4.txt

Optimizer:
  name: Adam
  beta1: 0.9  
  lr:
    name: Piecewise
    decay_epochs: [50, 100]
    values: [0.0001, 0.00005, 0.00001]
    warmup_epoch: 5
  beta2: 0.999
  regularizer:
    name: L2
    factor: 3.0e-05

Architecture:
  model_type: rec
  algorithm: SVTR_LCNet
  Transform:
  Backbone:
    name: PPLCNetV3
    scale: 0.95
  Head:
    name: MultiHead
    head_list:
      - CTCHead:
          Neck:
            name: svtr
            dims: 120
            depth: 2
            hidden_dims: 120
            kernel_size: [1, 3]
            use_guide: True
          Head:
            fc_decay: 0.00001
      - NRTRHead:
          nrtr_dim: 384
          max_text_length: 25

Loss:
  name: MultiLoss
  loss_config_list:
    - CTCLoss:
    - NRTRLoss:

PostProcess:  
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: MultiScaleDataSet
    ds_width: false
    data_dir: train_data/Recognition/words/Training_Set/
    ext_op_transform_idx: 1
    label_file_list:
    - train_data/Recognition/words/Training_Set/gt.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - RecConAug:
        prob: 0.5
        ext_data_num: 2
        image_shape: [48, 320, 3]
        max_text_length: 25
    - RecAug:
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  sampler:
    name: MultiScaleSampler
    scales: [[320, 32], [320, 48], [320, 64]]
    first_bs: &bs 48
    fix_bs: false
    divided_factor: [8, 16] # w, h
    is_training: True
  loader:
    shuffle: true
    batch_size_per_card: *bs
    drop_last: true
    num_workers: 0
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: train_data/Recognition/words/Validation_Set/
    label_file_list:
    - train_data/Recognition/words/Validation_Set/gt.txt
    transforms:
    - DecodeImage:
        img_mode: BGR
        channel_first: false
    - MultiLabelEncode:
        gtc_encode: NRTRLabelEncode
    - RecResizeImg:
        image_shape: [3, 48, 320]
    - KeepKeys:
        keep_keys:
        - image
        - label_ctc
        - label_gtc
        - length
        - valid_ratio
  loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 48
    num_workers: 0

and I use the following command to do the training:

python3 tools/train.py -c train_data/Recognition/arabic_PP-OCRv4_fineTune.yml

The accuracy initially starts out very poor, but gets to a very high level reasonably quickly (over 95% accuracy). However the model itself seems to perform worse as the accuracy increases.

If I take an image from my training set and run the recognition on it when it first outputs a checkpoint, while the accuracy is seemingly low, with:

 python3 tools/infer_rec.py -c rain_data/Recognition/arabic_PP-OCRv4_fineTune.yml -o Global.pretrained_model=output/rec/arabic_rec_finetune_ppocrv4/iter_epoch_10 Global.infer_img=/train_data/Recognition/words/Training_Set/word_14319.png

it will typically recognise the word with high accuracy.

However if I allow the training to continue, and wait until the accuracy reportedly increases, the ability of the model to recognise any of the training or validation set seems to decrease in practice - returning a result of nothing with confidence 0.0 e.g: for a few of the sample files, with the models used for inference at subsequent epochs:

word1.png      
result: خشا 0.899590790271759
result: خلا 0.7412670254707336
result: ا 0.9578542709350586
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
word2.png
result: ةنا 0.8191770911216736
result: ا 0.9644339680671692
result: ا 0.9546175599098206
result: ا 0.8386390805244446
result: ا 0.908350944519043
result: ا 0.48394304513931274
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
word3.png
result: زح 0.8297013640403748
result: زح 0.7330917119979858
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 
result: ' ' 0.0 

and so on, while if I use the model that hasnt been fine-tuned it does a good job of recognising the characters.

However the reported accuracy is increasing:

epoch: [1/200], acc: 0.031250,
epoch: [2/200], acc: 0.208333,
epoch: [3/200], acc: 0.463542,
epoch: [4/200], acc: 0.625000,
epoch: [5/200], acc: 0.625000,
epoch: [6/200], acc: 0.661458,
epoch: [7/200], acc: 0.723958,
epoch: [8/200], acc: 0.786458,
epoch: [9/200], acc: 0.786458,
epoch: [10/200], acc: 0.802083,
epoch: [11/200], acc: 0.781250,
epoch: [12/200], acc: 0.802083,
epoch: [13/200], acc: 0.781250,
epoch: [14/200], acc: 0.802083,
epoch: [15/200], acc: 0.822916,
epoch: [16/200], acc: 0.812500,
epoch: [17/200], acc: 0.812500,
epoch: [18/200], acc: 0.854166,
epoch: [19/200], acc: 0.843750,
epoch: [20/200], acc: 0.843750,
epoch: [21/200], acc: 0.843750,
epoch: [22/200], acc: 0.875000,
epoch: [23/200], acc: 0.843750,
epoch: [24/200], acc: 0.875000,
epoch: [25/200], acc: 0.875000,

But the performance gets significantly worse

Can anyone suggest what may be happening here?

juvebogdan commented 5 months ago

Try finetuning for just 5 or so epochs and see how does that do