PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.01k stars 7.72k forks source link

Vertical text recognition #4084

Closed susanin1970 closed 2 years ago

susanin1970 commented 3 years ago

Hello! Thanks for this great toolkit :)

The main page of PaddleOCR repository says, that it supports vertical text recognition I have dataset with vertical oriented numbers of intermodal containers. It includes 21k images with numbers like this:

APHU6881694__vmtp1_20190322142814_1428_2_06523__1

Can I train one of the models in PaddleOCR zoo on this dataset? Which model is preferable for this dataset and which settings for training are better to choose (input image resolution, etc.)

I tried to train CRNN with backbone MobileNetV3 on these images In the YAML file, I set the transforms parameter to the image shape set to [3, 240, 35] to keep the original orientation, but when starting the training I got an AssertionError. In one of the tickets, I saw that the height of the image during transformation should preferably be 32

Can I transform my dataset by rotating each image 90 degrees as shown below and train one of the PaddleOCR models so that I can get the correct recognition in this case?
APHU6761913__vmtp1_20190114164233_2_02368__1

Thanks in anvance to reply!

littletomatodonkey commented 3 years ago

The below figure is needed for vertical text recognition.

image

susanin1970 commented 3 years ago

It is enough for me to rotate the images with numbers by 90 degrees and organize the training in approximately the same way as described in the tutorial?

susanin1970 commented 3 years ago

I tried to learn CRNN model with backbone MobileNetV3 for experiments on a small dataset similar to the one mentioned above
It consists of 1100 unique vertical numbers of the following form:
image

The configuration for training looks like this:

Global:
  use_gpu: true
  epoch_num: 5000
  log_smooth_window: 20
  print_batch_step: 100
  save_model_dir: ./output/rec/ic15/
  save_epoch_step: 100
  # evaluation is run every 2000 iterations
  eval_batch_step: [0, 100]
  cal_metric_during_train: True
  pretrained_model: D:\Repositories\PaddleOCR\rec_mv3_none_bilstm_ctc_v2.0_train\best_accuracy
  checkpoints: 
  save_inference_dir: ./
  use_visualdl: False
  infer_img: doc/imgs_words_en/word_10.png
  # for data or label process
  character_dict_path: D:\Repositories\PaddleOCR\ppocr\utils\dict\custom_en_dict.txt
  character_type: ch
  max_text_length: 11
  infer_mode: False
  use_space_char: False
  save_res_path: ./output/rec/predicts_ic15.txt

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  lr:
    learning_rate: 0.0001
  regularizer:
    name: 'L2'
    factor: 0

Architecture:
  model_type: rec
  algorithm: CRNN
  Transform:
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: large
  Neck:
    name: SequenceEncoder
    encoder_type: rnn
    hidden_size: 96
  Head:
    name: CTCHead
    fc_decay: 0

Loss:
  name: CTCLoss

PostProcess:
  name: CTCLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: D:\Repositories\PaddleOCR\datasets\ISO_VERTICAL_LPRNET_DATASET\
    label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_VERTICAL_LPRNET_DATASET\\rec_gt_train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 200]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 128
    drop_last: True
    num_workers: 8
    use_shared_memory: False

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: D:\Repositories\PaddleOCR\datasets\ISO_VERTICAL_LPRNET_DATASET
    label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_VERTICAL_LPRNET_DATASET\\rec_gt_train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CTCLabelEncode: # Class handling label
      - RecResizeImg:
          image_shape: [3, 32, 200]
      - KeepKeys:
          keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 128
    num_workers: 8
    use_shared_memory: False

The training dataset is the same as the test dataset
File en_dict_with_sharp.py includes uppercase latin characters and digits only

During training, the accuracy tends to 1

[2021/09/21 13:09:32] root INFO: epoch: [3515/5000], iter: 24600, lr: 0.000100, loss: 0.011204, acc: 1.000000, norm_edit_dis: 1.000000, reader_cost: 0.00032 s, batch_cost: 0.00288 s, samples: 384, ips: 1333.36154
eval model::  89%|█████████████████████████████████████████████████████████████████▊        | 8/9 [00:00<00:00, 14.25it/s]
[2021/09/21 13:09:33] root INFO: cur metric, acc: 1.0, norm_edit_dis: 1.0, fps: 3190.9046978349975

After training, I translate the model with best accuracy into the inference format

python .\tools\export_model.py -c .\configs\rec\rec_icdar15_train_containers.yml -o Global.pretrained_model=D:\Repositories\PaddleOCR\output\rec\ic15\best_accuracy -o Global.save_inference_dir=.\output\rec\ic15\inference_model

Even if the path to the trained model is specified in the Global.pretrained_model, the export of the model, the path to which is specified in the pretrained_model parameter in the YAML file

After exporting I tried to inference trained model on my small dataset with Python API:

from paddleocr import PaddleOCR, draw_ocr
import cv2
import numpy as np
import os

if __name__ == "__main__":
     ocr = PaddleOCR(
            rec_model_dir=r"D:\Repositories\PaddleOCR\output\rec\ic15\inference_model", 
            rec_char_dict_path="D:\\Repositories\\PaddleOCR\\ppocr\\utils\\dict\\custom_en_dict.txt",
            use_gpu=True,
            rec_image_shape="3, 32, 200",
            det=False
                )
    path_to_test_images = r"D:\Repositories\PaddleOCR\datasets\ISO_VERTICAL_LPRNET_DATASET\iso_vertical_small"
    for image in os.listdir(path_to_test_images):
        label        = image.split("_")[0]
        opencv_image = cv2.imread(os.path.join(path_to_test_images, image))
        result       = ocr.ocr(opencv_image, cls=False, det=False)
        predicted    = result[0][0]
        print(f"{label} -- {predicted}")

And I get this:

AMFU8882172 -- CAIU2067089749
APHU6432163 -- TGHU000101047914410171707
APHU6506564 -- HGIU10171116490196
APHU6761913 -- TRKU3019975701683
APHU6881694 -- CASU18001217
APHU7016941 -- TRHU11018011
APHU7098180 -- CAIU40107171987910491002
APHU7198229 -- FCIU1001411115818199935
APHU7393781 -- CKIU00191851
APZU3269939 -- TRHU211171203004288
APZU3647406 -- TCLU713493710319
APZU3846120 -- CRIU07611401912906
APZU3893185 -- TGHU170169718408100190
APZU4409167 -- DRYU711700608110051
AXIU1357771 -- CKIU801111076471411141
AXIU1478475 -- TCIU701714071707018008
AXIU1644323 -- TCIU07111113101327981897
AXIU1645485 -- CRKU011001104013711
AXIU1654949 -- FXIU1011043032011
AXIU2182092 -- CAIU100178174108148
BEAU4094047 -- FESU0118461789
BEAU4112557 -- FESU3910717171119296911
BEAU4155013 -- CRIU17011718071871180
BEAU4865951 -- ASU92178999230
BHCU3077648 -- DCSU09135409110434
BMOU1278702 -- FESU171717911082
BMOU2033999 -- MSU93184107007
BMOU2083388 -- INU0718117000034088802
BMOU2402633 -- FASU1102010204570
BMOU2420344 -- FESU7100971818071835
BMOU2783925 -- FSFU711013100671077
BMOU2921536 -- TCHU10714571770
BMOU4106590 -- ASU210106040
BMOU4442698 -- FAIU73701130245184
BMOU4560617 -- ASIU7014497101
BMOU4703662 -- FESU0101141900797180
BMOU4757843 -- FCHU1130181172
BMOU5169320 -- TCHU160017717954
BMOU6213788 -- BEOU1013071180
BMOU6230511 -- FCSU3718442719121
BMOU6358734 -- HASU00727340971094
BMOU6371551 -- CAXU8801709113711
BMOU6385093 -- DRYU10102834052097
BMOU6446814 -- CAIU7800715
BMOU6919809 -- CAIU78771009
BMOU6925329 -- CAIU1170607747207
BMOU6930829 -- TEKU111979182730
BMOU6931065 -- FESU11717190
BSIU2330210 -- FSU7110013010190
BSIU2550976 -- TGHU7111147010
BSIU2739147 -- CAIU110717171
BSIU9085921 -- FEIU1091510427814
BSIU9179677 -- XKU2117100470710400
BSIU9206290 -- FRIU11718700717007
BSIU9516485 -- TRLU0710710100740
CAIU2047185 -- CXIU1100109111705
CAIU2221949 -- DRIU1047181084
CAIU2238474 -- DMIU157240577104
CAIU2309393 -- TRIU107111771000170010
CAIU2354237 -- CAIU1076091003
CAIU2383235 -- CAIU101704009702540
CAIU2403943 -- CAIU161171106141060
CAIU2449883 -- CRIU717171711000800006
CAIU2456250 -- FAIU171314080740
CAIU2520045 -- TKU11197000107
CAIU2636396 -- CIU1717700971701504
CAIU2697112 -- GHKU117470087011118
CAIU2709511 -- DRKU177471100481115
CAIU2741555 -- TRLU11057315410790
CAIU2753371 -- DIU1917770717
CAIU2869145 -- FRSU13489400119894
CAIU3034198 -- TCLU17914719770
CAIU3125467 -- TRIU17071061
CAIU3135043 -- TRHU2177101777107807
CAIU3162440 -- TRLU110017837771780
CAIU3166872 -- CAIU11717870602
CAIU3332696 -- CAIU177457770608
CAIU3375738 -- FCIU7171787114040
CAIU3378803 -- DCIU71117709710079078
CAIU3385926 -- CXIU111182306282
CAIU3415039 -- FAIU3101484120006
CAIU3647800 -- FEIU114986927100
CAIU3651292 -- IAIU1145783718987
CAIU3652365 -- CRXU71712755
CAIU3678776 -- AIU11111798191166
CAIU3714810 -- TLIU171747711930100
CAIU3728692 -- DRYU111711079000170
CAIU3734150 -- CAIU1107177676161101
CAIU4341075 -- TCLU70747117144
CAIU4341141 -- FCIU13710711781
CAIU4342235 -- FAIU0711101704490180
CAIU4355680 -- TRIU80101111007100710
CAIU4371402 -- DFSU11100110770
CAIU4936181 -- FCIU11147701001
CAIU4937948 -- TCIU101717548
CAIU6262859 -- FAIU11147563010524
CAIU7041596 -- CRIU1910109714307607
CAIU7477255 -- CAIU10711110018821
CAIU7847787 -- FAIU114404711077
CAIU7848783 -- RYU10171815135
CAIU7849820 -- CAIU1107683800
CAIU7852187 -- AXIU10107097147016
CAIU7852192 -- FRIU7341008112
CAIU7852490 -- TCLU131075720
CAIU7852783 -- TCLU117174849715

Despite the fact that acc tends to 1 during training, the recognition accuracy is not very good
And despite the fact that the max_text_length parameter in the configuration is 11, some recognized strings clearly contain more than 11 characters

Please tell me how I can fix this situation? Maybe I should apply some special transformations or try changing the model altogether?

I also tried learning from scratch, but the results were about the same

susanin1970 commented 3 years ago

Anyone please answer if you faced a similar problem

BaofengZan commented 2 years ago

请问,你最终训练出竖直文本的模型了么?

susanin1970 commented 2 years ago

Yep, I trained CRNN and SRN for vertical text successfully

BaofengZan commented 2 years ago

Ok, tks bro. One more question : the trained images are rotated with numbers by 90?

susanin1970 commented 2 years ago

Ok, tks bro. One more question : the trained images are rotated with numbers by 90?

Yes, before training I rotate images with this numbers by 90

sufiyansaqib commented 2 years ago

Hi @susanin1970 were u able to get accurate results for vertical text. I was encountering the same problem can u please let me know

sufiyansaqib commented 2 years ago

Can someone please help me with the vertical image dataset? I cant find it on the internet.

mariacao2012 commented 2 years ago

Hi @susanin1970, were you able to get accurate results for 90 degree rotated text? We are trying to explore using Paddle OCR to detect rotated text also. Thanks

susanin1970 commented 2 years ago

Sorry for late reply

SRN for vertical text recognition was trained on 1114 unique zones with text, which looks like this: image These numbers was rotated by 90 degrees

Training set has 946 images, test set has 168 images

Config of SRN for training looks like this:

Global:
  use_gpu: True
  epoch_num: 5000
  log_smooth_window: 20
  print_batch_step: 5
  save_model_dir: ./output/rec/srn_horizontal_two_strings_new
  save_epoch_step: 100
  # evaluation is run every 5000 iterations after the 4000th iteration
  eval_batch_step: [0, 100]
  cal_metric_during_train: True
  pretrained_model: D:\Repositories\PaddleOCR\rec_r50_vd_srn_train\best_accuracy
  checkpoints:
  save_inference_dir: ./
  use_visualdl: False
  infer_img: doc/imgs_words/ch/word_1.jpg
  # for data or label process
  character_dict_path: D:\Repositories\PaddleOCR\ppocr\utils\dict\en_dict.txt
  character_type: en
  max_text_length: 25
  num_heads: 8
  infer_mode: False
  use_space_char: False
  save_res_path: ./output/rec/predicts_srn.txt

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  clip_norm: 10.0
  lr:
    learning_rate: 0.0001

Architecture:
  model_type: rec
  algorithm: SRN
  in_channels: 1
  Transform:
  Backbone:
    name: ResNetFPN
  Head:
    name: SRNHead
    max_text_length: 25
    num_heads: 8
    num_encoder_TUs: 2
    num_decoder_TUs: 4
    hidden_dims: 512

Loss:
  name: SRNLoss

PostProcess:
  name: SRNLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\
    label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - SRNLabelEncode: # Class handling label
      - SRNRecResizeImg:
          image_shape: [1, 64, 256]
      - KeepKeys:
          keep_keys: ['image',
                      'label',
                      'length',
                      'encoder_word_pos',
                      'gsrm_word_pos',
                      'gsrm_slf_attn_bias1',
                      'gsrm_slf_attn_bias2'] # dataloader will return list in this order
  loader:
    shuffle: False
    batch_size_per_card: 16
    drop_last: False
    num_workers: 2

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\
    label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_test.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - SRNLabelEncode: # Class handling label
      - SRNRecResizeImg:
          image_shape: [1, 64, 256]
      - KeepKeys:
          keep_keys: ['image',
                      'label',
                      'length',
                      'encoder_word_pos',
                      'gsrm_word_pos',
                      'gsrm_slf_attn_bias1',
                      'gsrm_slf_attn_bias2'] 
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 16
    num_workers: 2

After training, I converted trained model in ONNX format, optimized and quantized trained model and tested on test dataset
Out of 168 numbers in test dataset, either 162 or 163 were recognized correctly, i.e. accuracy of about 96-97%

codecrack3 commented 2 years ago

I have same task like you. Can you provide your datasets. Thanks

paddle-bot-old[bot] commented 2 years ago

Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。

aisensiy commented 2 years ago

There is two steps for ocr, one is text detection and two is text recognition. Here you talk about the recognition part. But I am wondering if it is necessary to rotate the image when do the text detection part?

nulla-dies-sine-linea commented 2 years ago

Sorry for late reply

SRN for vertical text recognition was trained on 1114 unique zones with text, which looks like this: image These numbers was rotated by 90 degrees

Training set has 946 images, test set has 168 images

Config of SRN for training looks like this:

Global:
  use_gpu: True
  epoch_num: 5000
  log_smooth_window: 20
  print_batch_step: 5
  save_model_dir: ./output/rec/srn_horizontal_two_strings_new
  save_epoch_step: 100
  # evaluation is run every 5000 iterations after the 4000th iteration
  eval_batch_step: [0, 100]
  cal_metric_during_train: True
  pretrained_model: D:\Repositories\PaddleOCR\rec_r50_vd_srn_train\best_accuracy
  checkpoints:
  save_inference_dir: ./
  use_visualdl: False
  infer_img: doc/imgs_words/ch/word_1.jpg
  # for data or label process
  character_dict_path: D:\Repositories\PaddleOCR\ppocr\utils\dict\en_dict.txt
  character_type: en
  max_text_length: 25
  num_heads: 8
  infer_mode: False
  use_space_char: False
  save_res_path: ./output/rec/predicts_srn.txt

Optimizer:
  name: Adam
  beta1: 0.9
  beta2: 0.999
  clip_norm: 10.0
  lr:
    learning_rate: 0.0001

Architecture:
  model_type: rec
  algorithm: SRN
  in_channels: 1
  Transform:
  Backbone:
    name: ResNetFPN
  Head:
    name: SRNHead
    max_text_length: 25
    num_heads: 8
    num_encoder_TUs: 2
    num_decoder_TUs: 4
    hidden_dims: 512

Loss:
  name: SRNLoss

PostProcess:
  name: SRNLabelDecode

Metric:
  name: RecMetric
  main_indicator: acc

Train:
  dataset:
    name: SimpleDataSet
    data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\
    label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_train.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - SRNLabelEncode: # Class handling label
      - SRNRecResizeImg:
          image_shape: [1, 64, 256]
      - KeepKeys:
          keep_keys: ['image',
                      'label',
                      'length',
                      'encoder_word_pos',
                      'gsrm_word_pos',
                      'gsrm_slf_attn_bias1',
                      'gsrm_slf_attn_bias2'] # dataloader will return list in this order
  loader:
    shuffle: False
    batch_size_per_card: 16
    drop_last: False
    num_workers: 2

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\
    label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_test.txt"]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - SRNLabelEncode: # Class handling label
      - SRNRecResizeImg:
          image_shape: [1, 64, 256]
      - KeepKeys:
          keep_keys: ['image',
                      'label',
                      'length',
                      'encoder_word_pos',
                      'gsrm_word_pos',
                      'gsrm_slf_attn_bias1',
                      'gsrm_slf_attn_bias2'] 
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 16
    num_workers: 2

After training, I converted trained model in ONNX format, optimized and quantized trained model and tested on test dataset Out of 168 numbers in test dataset, either 162 or 163 were recognized correctly, i.e. accuracy of about 96-97%

Can you provide the datasets with numbers? Thanks.

ShivarajPatilaa commented 7 months ago

can you provide me a model

chandramouli3739 commented 7 months ago

Hello! Thanks for this great toolkit :)

The main page of PaddleOCR repository says, that it supports vertical text recognition I have dataset with vertical oriented numbers of intermodal containers. It includes 21k images with numbers like this:

APHU6881694__vmtp1_20190322142814_1428_2_06523__1

Can I train one of the models in PaddleOCR zoo on this dataset? Which model is preferable for this dataset and which settings for training are better to choose (input image resolution, etc.)

I tried to train CRNN with backbone MobileNetV3 on these images In the YAML file, I set the transforms parameter to the image shape set to [3, 240, 35] to keep the original orientation, but when starting the training I got an AssertionError. In one of the tickets, I saw that the height of the image during transformation should preferably be 32

Can I transform my dataset by rotating each image 90 degrees as shown below and train one of the PaddleOCR models so that I can get the correct recognition in this case? APHU6761913__vmtp1_20190114164233_2_02368__1

Thanks in anvance to reply!

Hi @susanin1970 Can you provide me the vertical text dataset. I need them for my work. Could you please provide it.

khanfarhan10 commented 3 months ago

The training dataset is the same as the test dataset

@susanin1970 the training doesnt work as it suffers from overfitting.