PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.1k stars 7.81k forks source link

二阶段印章弯曲文本识别训练,acc一直为0,训练是使用PPOCRv3配置文件还是svtrnet_ch的配置文件? #8191

Closed phb-shiyige-fw closed 9 months ago

phb-shiyige-fw commented 2 years ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

按照弯曲文本识别PR中的步骤,将已经标注好的数据集放入识别模型当中去,(这里插入一个问题,识别训练的图片需要截取印章中的文字)水平文本可以很好实现,但是弯曲文本只能取最小四点进行截取,效果如下,不知道是否可以满足训练结果。

circle_Aug000080 circle_Aug000081 circle_Aug000082 circle_Aug000083 请问截取这样的图片能够满足训练图片要求吗?

言归正传,按照教程使用ch_PP-OCRv3_rec.yml进行识别的参数配置,不知道需要修改哪些地方,以下是我的train.log,在(500train+200test印章数据集)转换为(1907train+693test裁剪出的文字数据集)放入训练中去后,acc一直都显示0.0,只有在180-200的最后几个epoch出现了一点点小跳动,最终best_acc为0.01,请问需要如何修改才能达到md中的61%的准确率呢?

[2022/11/02 11:29:03] ppocr INFO: load pretrain successful from ./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy [2022/11/02 11:29:03] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations [2022/11/02 11:29:54] ppocr INFO: Architecture : [2022/11/02 11:29:54] ppocr INFO: Backbone : [2022/11/02 11:29:54] ppocr INFO: last_conv_stride : [1, 2] [2022/11/02 11:29:54] ppocr INFO: last_pool_type : avg [2022/11/02 11:29:54] ppocr INFO: name : MobileNetV1Enhance [2022/11/02 11:29:54] ppocr INFO: scale : 0.5 [2022/11/02 11:29:54] ppocr INFO: Head : [2022/11/02 11:29:54] ppocr INFO: head_list : [2022/11/02 11:29:54] ppocr INFO: CTCHead : [2022/11/02 11:29:54] ppocr INFO: Head : [2022/11/02 11:29:54] ppocr INFO: fc_decay : 1e-05 [2022/11/02 11:29:54] ppocr INFO: Neck : [2022/11/02 11:29:54] ppocr INFO: depth : 2 [2022/11/02 11:29:54] ppocr INFO: dims : 64 [2022/11/02 11:29:54] ppocr INFO: hidden_dims : 120 [2022/11/02 11:29:54] ppocr INFO: name : svtr [2022/11/02 11:29:54] ppocr INFO: use_guide : True [2022/11/02 11:29:54] ppocr INFO: SARHead : [2022/11/02 11:29:54] ppocr INFO: enc_dim : 512 [2022/11/02 11:29:54] ppocr INFO: max_text_length : 25 [2022/11/02 11:29:54] ppocr INFO: name : MultiHead [2022/11/02 11:29:54] ppocr INFO: Transform : [2022/11/02 11:29:54] ppocr INFO: name : STN_ON [2022/11/02 11:29:54] ppocr INFO: num_control_points : 20 [2022/11/02 11:29:54] ppocr INFO: stn_activation : none [2022/11/02 11:29:54] ppocr INFO: tps_inputsize : [32, 64] [2022/11/02 11:29:54] ppocr INFO: tps_margins : [0.05, 0.05] [2022/11/02 11:29:54] ppocr INFO: tps_outputsize : [32, 100] [2022/11/02 11:29:54] ppocr INFO: algorithm : SVTR [2022/11/02 11:29:54] ppocr INFO: model_type : rec [2022/11/02 11:29:54] ppocr INFO: Eval : [2022/11/02 11:29:54] ppocr INFO: dataset : [2022/11/02 11:29:54] ppocr INFO: data_dir : ./sealdatasets/data/dataset/ [2022/11/02 11:29:54] ppocr INFO: label_file_list : ['./sealdatasets/data/dataset/test/test.txt'] [2022/11/02 11:29:54] ppocr INFO: name : SimpleDataSet [2022/11/02 11:29:54] ppocr INFO: transforms : [2022/11/02 11:29:54] ppocr INFO: DecodeImage : [2022/11/02 11:29:54] ppocr INFO: channel_first : False [2022/11/02 11:29:54] ppocr INFO: img_mode : BGR [2022/11/02 11:29:54] ppocr INFO: MultiLabelEncode : None [2022/11/02 11:29:54] ppocr INFO: RecResizeImg : [2022/11/02 11:29:54] ppocr INFO: image_shape : [3, 48, 320] [2022/11/02 11:29:54] ppocr INFO: KeepKeys : [2022/11/02 11:29:54] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2022/11/02 11:29:54] ppocr INFO: loader : [2022/11/02 11:29:54] ppocr INFO: batch_size_per_card : 128 [2022/11/02 11:29:54] ppocr INFO: drop_last : False [2022/11/02 11:29:54] ppocr INFO: num_workers : 2 [2022/11/02 11:29:54] ppocr INFO: shuffle : False [2022/11/02 11:29:54] ppocr INFO: Global : [2022/11/02 11:29:54] ppocr INFO: cal_metric_during_train : True [2022/11/02 11:29:54] ppocr INFO: character_dict_path : ppocr/utils/ppocr_keys_v1.txt [2022/11/02 11:29:54] ppocr INFO: checkpoints : None [2022/11/02 11:29:54] ppocr INFO: debug : False [2022/11/02 11:29:54] ppocr INFO: distributed : False [2022/11/02 11:29:54] ppocr INFO: epoch_num : 200 [2022/11/02 11:29:54] ppocr INFO: eval_batch_step : [0, 2000] [2022/11/02 11:29:54] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2022/11/02 11:29:54] ppocr INFO: infer_mode : False [2022/11/02 11:29:54] ppocr INFO: log_smooth_window : 20 [2022/11/02 11:29:54] ppocr INFO: max_text_length : 25 [2022/11/02 11:29:54] ppocr INFO: pretrained_model : ./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy [2022/11/02 11:29:54] ppocr INFO: print_batch_step : 10 [2022/11/02 11:29:54] ppocr INFO: save_epoch_step : 1 [2022/11/02 11:29:54] ppocr INFO: save_inference_dir : None [2022/11/02 11:29:54] ppocr INFO: save_model_dir : ./output/rec_svtr500 [2022/11/02 11:29:54] ppocr INFO: save_res_path : ./output/rec/predicts_svtr.txt [2022/11/02 11:29:54] ppocr INFO: use_gpu : True [2022/11/02 11:29:54] ppocr INFO: use_space_char : False [2022/11/02 11:29:54] ppocr INFO: use_visualdl : False [2022/11/02 11:29:54] ppocr INFO: Loss : [2022/11/02 11:29:54] ppocr INFO: loss_config_list : [2022/11/02 11:29:54] ppocr INFO: CTCLoss : None [2022/11/02 11:29:54] ppocr INFO: SARLoss : None [2022/11/02 11:29:54] ppocr INFO: name : MultiLoss [2022/11/02 11:29:54] ppocr INFO: Metric : [2022/11/02 11:29:54] ppocr INFO: ignore_space : False [2022/11/02 11:29:54] ppocr INFO: main_indicator : acc [2022/11/02 11:29:54] ppocr INFO: name : RecMetric [2022/11/02 11:29:54] ppocr INFO: Optimizer : [2022/11/02 11:29:54] ppocr INFO: beta1 : 0.9 [2022/11/02 11:29:54] ppocr INFO: beta2 : 0.999 [2022/11/02 11:29:54] ppocr INFO: lr : [2022/11/02 11:29:54] ppocr INFO: learning_rate : 0.00025 [2022/11/02 11:29:54] ppocr INFO: name : Cosine [2022/11/02 11:29:54] ppocr INFO: warmup_epoch : 10 [2022/11/02 11:29:54] ppocr INFO: name : Adam [2022/11/02 11:29:54] ppocr INFO: regularizer : [2022/11/02 11:29:54] ppocr INFO: factor : 3e-05 [2022/11/02 11:29:54] ppocr INFO: name : L2 [2022/11/02 11:29:54] ppocr INFO: PostProcess : [2022/11/02 11:29:54] ppocr INFO: name : CTCLabelDecode [2022/11/02 11:29:54] ppocr INFO: Train : [2022/11/02 11:29:54] ppocr INFO: dataset : [2022/11/02 11:29:54] ppocr INFO: data_dir : ./sealdatasets/data/dataset/ [2022/11/02 11:29:54] ppocr INFO: ext_op_transform_idx : 1 [2022/11/02 11:29:54] ppocr INFO: label_file_list : ['./sealdatasets/data/dataset/train/train.txt'] [2022/11/02 11:29:54] ppocr INFO: name : SimpleDataSet [2022/11/02 11:29:54] ppocr INFO: transforms : [2022/11/02 11:29:54] ppocr INFO: DecodeImage : [2022/11/02 11:29:54] ppocr INFO: channel_first : False [2022/11/02 11:29:54] ppocr INFO: img_mode : BGR [2022/11/02 11:29:54] ppocr INFO: RecConAug : [2022/11/02 11:29:54] ppocr INFO: ext_data_num : 2 [2022/11/02 11:29:54] ppocr INFO: image_shape : [48, 320, 3] [2022/11/02 11:29:54] ppocr INFO: max_text_length : 25 [2022/11/02 11:29:54] ppocr INFO: prob : 0.5 [2022/11/02 11:29:54] ppocr INFO: RecAug : None [2022/11/02 11:29:54] ppocr INFO: MultiLabelEncode : None [2022/11/02 11:29:54] ppocr INFO: RecResizeImg : [2022/11/02 11:29:54] ppocr INFO: image_shape : [3, 48, 320] [2022/11/02 11:29:54] ppocr INFO: KeepKeys : [2022/11/02 11:29:54] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2022/11/02 11:29:54] ppocr INFO: loader : [2022/11/02 11:29:54] ppocr INFO: batch_size_per_card : 128 [2022/11/02 11:29:54] ppocr INFO: drop_last : True [2022/11/02 11:29:54] ppocr INFO: num_workers : 4 [2022/11/02 11:29:54] ppocr INFO: shuffle : True [2022/11/02 11:29:54] ppocr INFO: profiler_options : None [2022/11/02 11:29:54] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0) [2022/11/02 11:29:54] ppocr INFO: Initialize indexs of datasets:['./sealdatasets/data/dataset/train/train.txt'] [2022/11/02 11:29:54] ppocr INFO: Initialize indexs of datasets:['./sealdatasets/data/dataset/test/test.txt'] [2022/11/02 11:29:55] ppocr INFO: train dataloader has 14 iters [2022/11/02 11:29:55] ppocr INFO: valid dataloader has 6 iters [2022/11/02 11:29:55] ppocr INFO: load pretrain successful from ./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy [2022/11/02 11:29:55] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations

phb-shiyige-fw commented 2 years ago

@LDOUBLEV 希望可以帮助一下我,谢谢。

xialei2821212670 commented 2 years ago

我是采用PPOCRv3

phb-shiyige-fw commented 2 years ago

我是采用PPOCRv3

请问参数配置有什么需要注意的地方吗?我训练之后准确率一直为0,我的train.log如下,能帮我看一看嘛? train.log

xialei2821212670 commented 2 years ago

默认不用修改啥配置,更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

phb-shiyige-fw commented 2 years ago

默认不用修改啥配置,更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

那TPS要加进去吗?transform那里本来是没有内容的 我加了 name : STN_ON num_control_points : 20 stn_activation : none tps_inputsize : [32, 64] tps_margins : [0.05, 0.05] tps_outputsize : [32, 100]

phb-shiyige-fw commented 2 years ago

默认不用修改啥配置,更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

还有 想请问一下 你的训练数据量是多少啊?

phb-shiyige-fw commented 2 years ago

默认不用修改啥配置,更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

您好 我将数据集重新切分了 如下所示,部分图片存在一定的倾斜角度,请问可以直接放入ppocrv3_rec里面识别训练吗?可以问一下您训练的时候数据集的数量是多少吗? circle_Aug000080 circle_Aug000082 circle_Aug000141 circle_Aug000562

xialei2821212670 commented 2 years ago

不用修改什么配置,更改数据集地址就可以但是比较慢。

您好我将重新切分了以下数据部分,图片显示了某些时候的倾斜角度,请问您可以直接记录一下这些数据集的训练结果吗? circle_Aug000080 circle_Aug000082 circle_Aug000141 circle_Aug000562

我全部切分出来进行训练的,这看上去切分的不是很干净

phb-shiyige-fw commented 2 years ago

我全部切分出来进行训练的,这看上去切分的不是很干净

怎样才能切分干净啊?我是把14点裁剪修改成4点后得到的裁剪结果,不知道如何进一步优化了。 cut.txt

dengmingD commented 1 year ago

我全部切分出来进行训练的,这看上去切分的不是很干净

怎样才能切分干净啊?我是把14点裁剪修改成4点后得到的裁剪结果,不知道如何进一步优化了。 cut.txt

这个代码写的有问题,需要把mask的定义写到for循环里,像这个样子 for i in range(nsize): mask = np.zeros((height, width), dtype=np.uint8)

sisrfeng commented 1 year ago

类似问题:

doc/doc_ch/recognition.md 把我带到 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml, 用这个配置文件, \ character_dict_path用的是icdar官网下载的chars.txt, \ 在自己仿照icdar2023 ReST(Seal Title)生成出来的1.6万张印章图片上训练, \ acc一直为0

344089386 commented 1 year ago

类似问题:

doc/doc_ch/recognition.md 把我带到 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml, 用这个配置文件, character_dict_path用的是icdar官网下载的chars.txt, 在自己仿照icdar2023 ReST(Seal Title)生成出来的1.6万张印章图片上训练, acc一直为0

我也是 eval的 acc一直是0,不知道怎么回事

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.