en_PP-OCRv3_rec_train fine tuning not working. Accuracy always starts from 0. Even after 500 epoch with 1500 dataset fine tuned model generate bad results than original trained model.

saurabhmali1 commented 3 weeks ago

Config: Global: debug: false use_gpu: true epoch_num: 300 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/v3_en_mobile save_epoch_step: 5 eval_batch_step: [0, 2000] cal_metric_during_train: true pretrained_model: \rec\en\en_PP-OCRv3_rec_train\best_accuracy checkpoints: save_inference_dir: use_visualdl: false infer_img: doc/imgs_words/ch/word_1.jpg character_dict_path: PaddleOCR\ppocr\utils\en_dict.txt max_text_length: &max_text_length 100 infer_mode: true use_space_char: true distributed: true save_res_path: ./output/rec/predicts_ppocrv3_en.txt

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 warmup_epoch: 5 regularizer: name: L2 factor: 3.0e-05

Architecture: model_type: rec algorithm: SVTR_LCNet Transform: Backbone: name: MobileNetV1Enhance scale: 0.5 last_conv_stride: [1, 2] last_pool_type: avg last_pool_kernel_size: [2, 2] Head: name: MultiHead head_list:

CTCHead: Neck: name: svtr dims: 64 depth: 2 hidden_dims: 120 use_guide: True Head: fc_decay: 0.00001
SARHead: enc_dim: 512 max_text_length: *max_text_length

Loss: name: MultiLoss loss_config_list:

CTCLoss:
SARLoss:

PostProcess:
name: CTCLabelDecode

Metric: name: RecMetric main_indicator: acc ignore_space: False

Train: dataset: name: SimpleDataSet data_dir: dataset\rec\img ext_op_transform_idx: 1 label_file_list:

dataset\rec\gt_test.txt transforms:
DecodeImage: img_mode: BGR channel_first: false
RecConAug: prob: 0.5 ext_data_num: 2 image_shape: [48, 320, 3] max_text_length: *max_text_length
RecAug:
MultiLabelEncode:
RecResizeImg: image_shape: [3, 48, 320]
KeepKeys: keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio loader: shuffle: true batch_size_per_card: 1 drop_last: true num_workers: 8 Eval: dataset: name: SimpleDataSet data_dir: dataset\rec\img label_file_list:
dataset\rec\gt_test.txt transforms:
DecodeImage: img_mode: BGR channel_first: false
MultiLabelEncode:
RecResizeImg: image_shape: [3, 48, 320]
KeepKeys: keep_keys:
- image
- label_ctc
- label_sar
- length
- valid_ratio loader: shuffle: false drop_last: false batch_size_per_card: 1 num_workers: 8

UserWangZz commented 3 weeks ago

Has the pre trained model been imported correctly? If so, you can use the model to simply infer the image and see if it converges during normal training. If not, check if the address of the pre trained model is correct.

saurabhmali1 commented 3 weeks ago

@UserWangZz Yes it is imported correctly. When I try to train my pretrained model it starts from good accuracy but when i try to train en_PP-OCRv3_rec with given config and en_dict, it starts from 0 accuracy.

UserWangZz commented 2 weeks ago

You can first use the eval script to evaluate the performance of the en_PP-OCRv3_rec model on your data set

UserWangZz commented 2 weeks ago

i am not find error in your config file

saurabhmali1 commented 2 weeks ago

Is there any recommended size or quality of images while generating custom rec dataset? I am working on old blurry documents.

UserWangZz commented 2 weeks ago

This is an empirical question. We recommend that the width of the input image should not be too large, and if the data set image is blurry, it may affect the recognition results.

saurabhmali1 commented 1 week ago

@UserWangZz What should be the values of image_shape ? Is it max height, width of dataset image ? When ever I try to change it [3,50,320] it throws assertion error. These are some examples of my dataset images

PaddlePaddle / PaddleOCR

en_PP-OCRv3_rec_train fine tuning not working. Accuracy always starts from 0. Even after 500 epoch with 1500 dataset fine tuned model generate bad results than original trained model. #12059