Closed susanin1970 closed 2 years ago
The below figure is needed for vertical text recognition.
It is enough for me to rotate the images with numbers by 90 degrees and organize the training in approximately the same way as described in the tutorial?
I tried to learn CRNN model with backbone MobileNetV3 for experiments on a small dataset similar to the one mentioned above
It consists of 1100 unique vertical numbers of the following form:
The configuration for training looks like this:
Global:
use_gpu: true
epoch_num: 5000
log_smooth_window: 20
print_batch_step: 100
save_model_dir: ./output/rec/ic15/
save_epoch_step: 100
# evaluation is run every 2000 iterations
eval_batch_step: [0, 100]
cal_metric_during_train: True
pretrained_model: D:\Repositories\PaddleOCR\rec_mv3_none_bilstm_ctc_v2.0_train\best_accuracy
checkpoints:
save_inference_dir: ./
use_visualdl: False
infer_img: doc/imgs_words_en/word_10.png
# for data or label process
character_dict_path: D:\Repositories\PaddleOCR\ppocr\utils\dict\custom_en_dict.txt
character_type: ch
max_text_length: 11
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_ic15.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
learning_rate: 0.0001
regularizer:
name: 'L2'
factor: 0
Architecture:
model_type: rec
algorithm: CRNN
Transform:
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
Neck:
name: SequenceEncoder
encoder_type: rnn
hidden_size: 96
Head:
name: CTCHead
fc_decay: 0
Loss:
name: CTCLoss
PostProcess:
name: CTCLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: D:\Repositories\PaddleOCR\datasets\ISO_VERTICAL_LPRNET_DATASET\
label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_VERTICAL_LPRNET_DATASET\\rec_gt_train.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 200]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: True
batch_size_per_card: 128
drop_last: True
num_workers: 8
use_shared_memory: False
Eval:
dataset:
name: SimpleDataSet
data_dir: D:\Repositories\PaddleOCR\datasets\ISO_VERTICAL_LPRNET_DATASET
label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_VERTICAL_LPRNET_DATASET\\rec_gt_train.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- CTCLabelEncode: # Class handling label
- RecResizeImg:
image_shape: [3, 32, 200]
- KeepKeys:
keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 128
num_workers: 8
use_shared_memory: False
The training dataset is the same as the test dataset
File en_dict_with_sharp.py
includes uppercase latin characters and digits only
During training, the accuracy tends to 1
[2021/09/21 13:09:32] root INFO: epoch: [3515/5000], iter: 24600, lr: 0.000100, loss: 0.011204, acc: 1.000000, norm_edit_dis: 1.000000, reader_cost: 0.00032 s, batch_cost: 0.00288 s, samples: 384, ips: 1333.36154
eval model:: 89%|█████████████████████████████████████████████████████████████████▊ | 8/9 [00:00<00:00, 14.25it/s]
[2021/09/21 13:09:33] root INFO: cur metric, acc: 1.0, norm_edit_dis: 1.0, fps: 3190.9046978349975
After training, I translate the model with best accuracy into the inference format
python .\tools\export_model.py -c .\configs\rec\rec_icdar15_train_containers.yml -o Global.pretrained_model=D:\Repositories\PaddleOCR\output\rec\ic15\best_accuracy -o Global.save_inference_dir=.\output\rec\ic15\inference_model
Even if the path to the trained model is specified in the Global.pretrained_model
, the export of the model, the path to which is specified in the pretrained_model parameter in the YAML file
After exporting I tried to inference trained model on my small dataset with Python API:
from paddleocr import PaddleOCR, draw_ocr
import cv2
import numpy as np
import os
if __name__ == "__main__":
ocr = PaddleOCR(
rec_model_dir=r"D:\Repositories\PaddleOCR\output\rec\ic15\inference_model",
rec_char_dict_path="D:\\Repositories\\PaddleOCR\\ppocr\\utils\\dict\\custom_en_dict.txt",
use_gpu=True,
rec_image_shape="3, 32, 200",
det=False
)
path_to_test_images = r"D:\Repositories\PaddleOCR\datasets\ISO_VERTICAL_LPRNET_DATASET\iso_vertical_small"
for image in os.listdir(path_to_test_images):
label = image.split("_")[0]
opencv_image = cv2.imread(os.path.join(path_to_test_images, image))
result = ocr.ocr(opencv_image, cls=False, det=False)
predicted = result[0][0]
print(f"{label} -- {predicted}")
And I get this:
AMFU8882172 -- CAIU2067089749
APHU6432163 -- TGHU000101047914410171707
APHU6506564 -- HGIU10171116490196
APHU6761913 -- TRKU3019975701683
APHU6881694 -- CASU18001217
APHU7016941 -- TRHU11018011
APHU7098180 -- CAIU40107171987910491002
APHU7198229 -- FCIU1001411115818199935
APHU7393781 -- CKIU00191851
APZU3269939 -- TRHU211171203004288
APZU3647406 -- TCLU713493710319
APZU3846120 -- CRIU07611401912906
APZU3893185 -- TGHU170169718408100190
APZU4409167 -- DRYU711700608110051
AXIU1357771 -- CKIU801111076471411141
AXIU1478475 -- TCIU701714071707018008
AXIU1644323 -- TCIU07111113101327981897
AXIU1645485 -- CRKU011001104013711
AXIU1654949 -- FXIU1011043032011
AXIU2182092 -- CAIU100178174108148
BEAU4094047 -- FESU0118461789
BEAU4112557 -- FESU3910717171119296911
BEAU4155013 -- CRIU17011718071871180
BEAU4865951 -- ASU92178999230
BHCU3077648 -- DCSU09135409110434
BMOU1278702 -- FESU171717911082
BMOU2033999 -- MSU93184107007
BMOU2083388 -- INU0718117000034088802
BMOU2402633 -- FASU1102010204570
BMOU2420344 -- FESU7100971818071835
BMOU2783925 -- FSFU711013100671077
BMOU2921536 -- TCHU10714571770
BMOU4106590 -- ASU210106040
BMOU4442698 -- FAIU73701130245184
BMOU4560617 -- ASIU7014497101
BMOU4703662 -- FESU0101141900797180
BMOU4757843 -- FCHU1130181172
BMOU5169320 -- TCHU160017717954
BMOU6213788 -- BEOU1013071180
BMOU6230511 -- FCSU3718442719121
BMOU6358734 -- HASU00727340971094
BMOU6371551 -- CAXU8801709113711
BMOU6385093 -- DRYU10102834052097
BMOU6446814 -- CAIU7800715
BMOU6919809 -- CAIU78771009
BMOU6925329 -- CAIU1170607747207
BMOU6930829 -- TEKU111979182730
BMOU6931065 -- FESU11717190
BSIU2330210 -- FSU7110013010190
BSIU2550976 -- TGHU7111147010
BSIU2739147 -- CAIU110717171
BSIU9085921 -- FEIU1091510427814
BSIU9179677 -- XKU2117100470710400
BSIU9206290 -- FRIU11718700717007
BSIU9516485 -- TRLU0710710100740
CAIU2047185 -- CXIU1100109111705
CAIU2221949 -- DRIU1047181084
CAIU2238474 -- DMIU157240577104
CAIU2309393 -- TRIU107111771000170010
CAIU2354237 -- CAIU1076091003
CAIU2383235 -- CAIU101704009702540
CAIU2403943 -- CAIU161171106141060
CAIU2449883 -- CRIU717171711000800006
CAIU2456250 -- FAIU171314080740
CAIU2520045 -- TKU11197000107
CAIU2636396 -- CIU1717700971701504
CAIU2697112 -- GHKU117470087011118
CAIU2709511 -- DRKU177471100481115
CAIU2741555 -- TRLU11057315410790
CAIU2753371 -- DIU1917770717
CAIU2869145 -- FRSU13489400119894
CAIU3034198 -- TCLU17914719770
CAIU3125467 -- TRIU17071061
CAIU3135043 -- TRHU2177101777107807
CAIU3162440 -- TRLU110017837771780
CAIU3166872 -- CAIU11717870602
CAIU3332696 -- CAIU177457770608
CAIU3375738 -- FCIU7171787114040
CAIU3378803 -- DCIU71117709710079078
CAIU3385926 -- CXIU111182306282
CAIU3415039 -- FAIU3101484120006
CAIU3647800 -- FEIU114986927100
CAIU3651292 -- IAIU1145783718987
CAIU3652365 -- CRXU71712755
CAIU3678776 -- AIU11111798191166
CAIU3714810 -- TLIU171747711930100
CAIU3728692 -- DRYU111711079000170
CAIU3734150 -- CAIU1107177676161101
CAIU4341075 -- TCLU70747117144
CAIU4341141 -- FCIU13710711781
CAIU4342235 -- FAIU0711101704490180
CAIU4355680 -- TRIU80101111007100710
CAIU4371402 -- DFSU11100110770
CAIU4936181 -- FCIU11147701001
CAIU4937948 -- TCIU101717548
CAIU6262859 -- FAIU11147563010524
CAIU7041596 -- CRIU1910109714307607
CAIU7477255 -- CAIU10711110018821
CAIU7847787 -- FAIU114404711077
CAIU7848783 -- RYU10171815135
CAIU7849820 -- CAIU1107683800
CAIU7852187 -- AXIU10107097147016
CAIU7852192 -- FRIU7341008112
CAIU7852490 -- TCLU131075720
CAIU7852783 -- TCLU117174849715
Despite the fact that acc tends to 1 during training, the recognition accuracy is not very good
And despite the fact that the max_text_length
parameter in the configuration is 11, some recognized strings clearly contain more than 11 characters
Please tell me how I can fix this situation? Maybe I should apply some special transformations or try changing the model altogether?
I also tried learning from scratch, but the results were about the same
Anyone please answer if you faced a similar problem
请问,你最终训练出竖直文本的模型了么?
Yep, I trained CRNN and SRN for vertical text successfully
Ok, tks bro. One more question : the trained images are rotated with numbers by 90?
Ok, tks bro. One more question : the trained images are rotated with numbers by 90?
Yes, before training I rotate images with this numbers by 90
Hi @susanin1970 were u able to get accurate results for vertical text. I was encountering the same problem can u please let me know
Can someone please help me with the vertical image dataset? I cant find it on the internet.
Hi @susanin1970, were you able to get accurate results for 90 degree rotated text? We are trying to explore using Paddle OCR to detect rotated text also. Thanks
Sorry for late reply
SRN for vertical text recognition was trained on 1114 unique zones with text, which looks like this: These numbers was rotated by 90 degrees
Training set has 946 images, test set has 168 images
Config of SRN for training looks like this:
Global:
use_gpu: True
epoch_num: 5000
log_smooth_window: 20
print_batch_step: 5
save_model_dir: ./output/rec/srn_horizontal_two_strings_new
save_epoch_step: 100
# evaluation is run every 5000 iterations after the 4000th iteration
eval_batch_step: [0, 100]
cal_metric_during_train: True
pretrained_model: D:\Repositories\PaddleOCR\rec_r50_vd_srn_train\best_accuracy
checkpoints:
save_inference_dir: ./
use_visualdl: False
infer_img: doc/imgs_words/ch/word_1.jpg
# for data or label process
character_dict_path: D:\Repositories\PaddleOCR\ppocr\utils\dict\en_dict.txt
character_type: en
max_text_length: 25
num_heads: 8
infer_mode: False
use_space_char: False
save_res_path: ./output/rec/predicts_srn.txt
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
clip_norm: 10.0
lr:
learning_rate: 0.0001
Architecture:
model_type: rec
algorithm: SRN
in_channels: 1
Transform:
Backbone:
name: ResNetFPN
Head:
name: SRNHead
max_text_length: 25
num_heads: 8
num_encoder_TUs: 2
num_decoder_TUs: 4
hidden_dims: 512
Loss:
name: SRNLoss
PostProcess:
name: SRNLabelDecode
Metric:
name: RecMetric
main_indicator: acc
Train:
dataset:
name: SimpleDataSet
data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\
label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_train.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SRNLabelEncode: # Class handling label
- SRNRecResizeImg:
image_shape: [1, 64, 256]
- KeepKeys:
keep_keys: ['image',
'label',
'length',
'encoder_word_pos',
'gsrm_word_pos',
'gsrm_slf_attn_bias1',
'gsrm_slf_attn_bias2'] # dataloader will return list in this order
loader:
shuffle: False
batch_size_per_card: 16
drop_last: False
num_workers: 2
Eval:
dataset:
name: SimpleDataSet
data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\
label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_test.txt"]
transforms:
- DecodeImage: # load image
img_mode: BGR
channel_first: False
- SRNLabelEncode: # Class handling label
- SRNRecResizeImg:
image_shape: [1, 64, 256]
- KeepKeys:
keep_keys: ['image',
'label',
'length',
'encoder_word_pos',
'gsrm_word_pos',
'gsrm_slf_attn_bias1',
'gsrm_slf_attn_bias2']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 16
num_workers: 2
After training, I converted trained model in ONNX format, optimized and quantized trained model and tested on test dataset
Out of 168 numbers in test dataset, either 162 or 163 were recognized correctly, i.e. accuracy of about 96-97%
I have same task like you. Can you provide your datasets. Thanks
Since you haven\'t replied for more than 3 months, we have closed this issue/pr. If the problem is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于您超过三个月未回复,我们将关闭这个issue/pr。 若问题未解决或有后续问题,请随时重新打开(建议先拉取最新代码进行尝试),我们会继续跟进。
There is two steps for ocr, one is text detection and two is text recognition. Here you talk about the recognition part. But I am wondering if it is necessary to rotate the image when do the text detection part?
Sorry for late reply
SRN for vertical text recognition was trained on 1114 unique zones with text, which looks like this: These numbers was rotated by 90 degrees
Training set has 946 images, test set has 168 images
Config of SRN for training looks like this:
Global: use_gpu: True epoch_num: 5000 log_smooth_window: 20 print_batch_step: 5 save_model_dir: ./output/rec/srn_horizontal_two_strings_new save_epoch_step: 100 # evaluation is run every 5000 iterations after the 4000th iteration eval_batch_step: [0, 100] cal_metric_during_train: True pretrained_model: D:\Repositories\PaddleOCR\rec_r50_vd_srn_train\best_accuracy checkpoints: save_inference_dir: ./ use_visualdl: False infer_img: doc/imgs_words/ch/word_1.jpg # for data or label process character_dict_path: D:\Repositories\PaddleOCR\ppocr\utils\dict\en_dict.txt character_type: en max_text_length: 25 num_heads: 8 infer_mode: False use_space_char: False save_res_path: ./output/rec/predicts_srn.txt Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 10.0 lr: learning_rate: 0.0001 Architecture: model_type: rec algorithm: SRN in_channels: 1 Transform: Backbone: name: ResNetFPN Head: name: SRNHead max_text_length: 25 num_heads: 8 num_encoder_TUs: 2 num_decoder_TUs: 4 hidden_dims: 512 Loss: name: SRNLoss PostProcess: name: SRNLabelDecode Metric: name: RecMetric main_indicator: acc Train: dataset: name: SimpleDataSet data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\ label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_train.txt"] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - SRNLabelEncode: # Class handling label - SRNRecResizeImg: image_shape: [1, 64, 256] - KeepKeys: keep_keys: ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] # dataloader will return list in this order loader: shuffle: False batch_size_per_card: 16 drop_last: False num_workers: 2 Eval: dataset: name: SimpleDataSet data_dir: D:\Repositories\PaddleOCR\datasets\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\ label_file_list: ["D:\\Repositories\\PaddleOCR\\datasets\\ISO_HORIZONTAL_TWO_STRING_LPRNET_DATASET\\rec_gt_test.txt"] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - SRNLabelEncode: # Class handling label - SRNRecResizeImg: image_shape: [1, 64, 256] - KeepKeys: keep_keys: ['image', 'label', 'length', 'encoder_word_pos', 'gsrm_word_pos', 'gsrm_slf_attn_bias1', 'gsrm_slf_attn_bias2'] loader: shuffle: False drop_last: False batch_size_per_card: 16 num_workers: 2
After training, I converted trained model in ONNX format, optimized and quantized trained model and tested on test dataset Out of 168 numbers in test dataset, either 162 or 163 were recognized correctly, i.e. accuracy of about 96-97%
Can you provide the datasets with numbers? Thanks.
can you provide me a model
Hello! Thanks for this great toolkit :)
The main page of PaddleOCR repository says, that it supports vertical text recognition I have dataset with vertical oriented numbers of intermodal containers. It includes 21k images with numbers like this:
Can I train one of the models in PaddleOCR zoo on this dataset? Which model is preferable for this dataset and which settings for training are better to choose (input image resolution, etc.)
I tried to train CRNN with backbone MobileNetV3 on these images In the YAML file, I set the transforms parameter to the image shape set to [3, 240, 35] to keep the original orientation, but when starting the training I got an AssertionError. In one of the tickets, I saw that the height of the image during transformation should preferably be 32
Can I transform my dataset by rotating each image 90 degrees as shown below and train one of the PaddleOCR models so that I can get the correct recognition in this case?
Thanks in anvance to reply!
Hi @susanin1970 Can you provide me the vertical text dataset. I need them for my work. Could you please provide it.
The training dataset is the same as the test dataset
@susanin1970 the training doesnt work as it suffers from overfitting.
Hello! Thanks for this great toolkit :)
The main page of PaddleOCR repository says, that it supports vertical text recognition I have dataset with vertical oriented numbers of intermodal containers. It includes 21k images with numbers like this:
Can I train one of the models in PaddleOCR zoo on this dataset? Which model is preferable for this dataset and which settings for training are better to choose (input image resolution, etc.)
I tried to train CRNN with backbone MobileNetV3 on these images In the YAML file, I set the transforms parameter to the image shape set to [3, 240, 35] to keep the original orientation, but when starting the training I got an AssertionError. In one of the tickets, I saw that the height of the image during transformation should preferably be 32
Can I transform my dataset by rotating each image 90 degrees as shown below and train one of the PaddleOCR models so that I can get the correct recognition in this case?
Thanks in anvance to reply!