二阶段印章弯曲文本识别训练，acc一直为0，训练是使用PPOCRv3配置文件还是svtrnet_ch的配置文件？

phb-shiyige-fw commented 2 years ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：WIN10
版本号/Version：Paddle：2.3.2 PaddleOCR：2.6
问题相关组件/Related components：
运行指令/Command Code： python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml
完整报错/Complete Error Message：

按照弯曲文本识别PR中的步骤，将已经标注好的数据集放入识别模型当中去，（这里插入一个问题，识别训练的图片需要截取印章中的文字）水平文本可以很好实现，但是弯曲文本只能取最小四点进行截取，效果如下，不知道是否可以满足训练结果。

circle_Aug000080 circle_Aug000081 circle_Aug000082 circle_Aug000083 请问截取这样的图片能够满足训练图片要求吗？

言归正传，按照教程使用ch_PP-OCRv3_rec.yml进行识别的参数配置，不知道需要修改哪些地方，以下是我的train.log，在（500train+200test印章数据集）转换为（1907train+693test裁剪出的文字数据集）放入训练中去后，acc一直都显示0.0，只有在180-200的最后几个epoch出现了一点点小跳动，最终best_acc为0.01，请问需要如何修改才能达到md中的61%的准确率呢？

[2022/11/02 11:29:03] ppocr INFO: load pretrain successful from ./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy [2022/11/02 11:29:03] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations [2022/11/02 11:29:54] ppocr INFO: Architecture : [2022/11/02 11:29:54] ppocr INFO: Backbone : [2022/11/02 11:29:54] ppocr INFO: last_conv_stride : [1, 2] [2022/11/02 11:29:54] ppocr INFO: last_pool_type : avg [2022/11/02 11:29:54] ppocr INFO: name : MobileNetV1Enhance [2022/11/02 11:29:54] ppocr INFO: scale : 0.5 [2022/11/02 11:29:54] ppocr INFO: Head : [2022/11/02 11:29:54] ppocr INFO: head_list : [2022/11/02 11:29:54] ppocr INFO: CTCHead : [2022/11/02 11:29:54] ppocr INFO: Head : [2022/11/02 11:29:54] ppocr INFO: fc_decay : 1e-05 [2022/11/02 11:29:54] ppocr INFO: Neck : [2022/11/02 11:29:54] ppocr INFO: depth : 2 [2022/11/02 11:29:54] ppocr INFO: dims : 64 [2022/11/02 11:29:54] ppocr INFO: hidden_dims : 120 [2022/11/02 11:29:54] ppocr INFO: name : svtr [2022/11/02 11:29:54] ppocr INFO: use_guide : True [2022/11/02 11:29:54] ppocr INFO: SARHead : [2022/11/02 11:29:54] ppocr INFO: enc_dim : 512 [2022/11/02 11:29:54] ppocr INFO: max_text_length : 25 [2022/11/02 11:29:54] ppocr INFO: name : MultiHead [2022/11/02 11:29:54] ppocr INFO: Transform : [2022/11/02 11:29:54] ppocr INFO: name : STN_ON [2022/11/02 11:29:54] ppocr INFO: num_control_points : 20 [2022/11/02 11:29:54] ppocr INFO: stn_activation : none [2022/11/02 11:29:54] ppocr INFO: tps_inputsize : [32, 64] [2022/11/02 11:29:54] ppocr INFO: tps_margins : [0.05, 0.05] [2022/11/02 11:29:54] ppocr INFO: tps_outputsize : [32, 100] [2022/11/02 11:29:54] ppocr INFO: algorithm : SVTR [2022/11/02 11:29:54] ppocr INFO: model_type : rec [2022/11/02 11:29:54] ppocr INFO: Eval : [2022/11/02 11:29:54] ppocr INFO: dataset : [2022/11/02 11:29:54] ppocr INFO: data_dir : ./sealdatasets/data/dataset/ [2022/11/02 11:29:54] ppocr INFO: label_file_list : ['./sealdatasets/data/dataset/test/test.txt'] [2022/11/02 11:29:54] ppocr INFO: name : SimpleDataSet [2022/11/02 11:29:54] ppocr INFO: transforms : [2022/11/02 11:29:54] ppocr INFO: DecodeImage : [2022/11/02 11:29:54] ppocr INFO: channel_first : False [2022/11/02 11:29:54] ppocr INFO: img_mode : BGR [2022/11/02 11:29:54] ppocr INFO: MultiLabelEncode : None [2022/11/02 11:29:54] ppocr INFO: RecResizeImg : [2022/11/02 11:29:54] ppocr INFO: image_shape : [3, 48, 320] [2022/11/02 11:29:54] ppocr INFO: KeepKeys : [2022/11/02 11:29:54] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2022/11/02 11:29:54] ppocr INFO: loader : [2022/11/02 11:29:54] ppocr INFO: batch_size_per_card : 128 [2022/11/02 11:29:54] ppocr INFO: drop_last : False [2022/11/02 11:29:54] ppocr INFO: num_workers : 2 [2022/11/02 11:29:54] ppocr INFO: shuffle : False [2022/11/02 11:29:54] ppocr INFO: Global : [2022/11/02 11:29:54] ppocr INFO: cal_metric_during_train : True [2022/11/02 11:29:54] ppocr INFO: character_dict_path : ppocr/utils/ppocr_keys_v1.txt [2022/11/02 11:29:54] ppocr INFO: checkpoints : None [2022/11/02 11:29:54] ppocr INFO: debug : False [2022/11/02 11:29:54] ppocr INFO: distributed : False [2022/11/02 11:29:54] ppocr INFO: epoch_num : 200 [2022/11/02 11:29:54] ppocr INFO: eval_batch_step : [0, 2000] [2022/11/02 11:29:54] ppocr INFO: infer_img : doc/imgs_words/ch/word_1.jpg [2022/11/02 11:29:54] ppocr INFO: infer_mode : False [2022/11/02 11:29:54] ppocr INFO: log_smooth_window : 20 [2022/11/02 11:29:54] ppocr INFO: max_text_length : 25 [2022/11/02 11:29:54] ppocr INFO: pretrained_model : ./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy [2022/11/02 11:29:54] ppocr INFO: print_batch_step : 10 [2022/11/02 11:29:54] ppocr INFO: save_epoch_step : 1 [2022/11/02 11:29:54] ppocr INFO: save_inference_dir : None [2022/11/02 11:29:54] ppocr INFO: save_model_dir : ./output/rec_svtr500 [2022/11/02 11:29:54] ppocr INFO: save_res_path : ./output/rec/predicts_svtr.txt [2022/11/02 11:29:54] ppocr INFO: use_gpu : True [2022/11/02 11:29:54] ppocr INFO: use_space_char : False [2022/11/02 11:29:54] ppocr INFO: use_visualdl : False [2022/11/02 11:29:54] ppocr INFO: Loss : [2022/11/02 11:29:54] ppocr INFO: loss_config_list : [2022/11/02 11:29:54] ppocr INFO: CTCLoss : None [2022/11/02 11:29:54] ppocr INFO: SARLoss : None [2022/11/02 11:29:54] ppocr INFO: name : MultiLoss [2022/11/02 11:29:54] ppocr INFO: Metric : [2022/11/02 11:29:54] ppocr INFO: ignore_space : False [2022/11/02 11:29:54] ppocr INFO: main_indicator : acc [2022/11/02 11:29:54] ppocr INFO: name : RecMetric [2022/11/02 11:29:54] ppocr INFO: Optimizer : [2022/11/02 11:29:54] ppocr INFO: beta1 : 0.9 [2022/11/02 11:29:54] ppocr INFO: beta2 : 0.999 [2022/11/02 11:29:54] ppocr INFO: lr : [2022/11/02 11:29:54] ppocr INFO: learning_rate : 0.00025 [2022/11/02 11:29:54] ppocr INFO: name : Cosine [2022/11/02 11:29:54] ppocr INFO: warmup_epoch : 10 [2022/11/02 11:29:54] ppocr INFO: name : Adam [2022/11/02 11:29:54] ppocr INFO: regularizer : [2022/11/02 11:29:54] ppocr INFO: factor : 3e-05 [2022/11/02 11:29:54] ppocr INFO: name : L2 [2022/11/02 11:29:54] ppocr INFO: PostProcess : [2022/11/02 11:29:54] ppocr INFO: name : CTCLabelDecode [2022/11/02 11:29:54] ppocr INFO: Train : [2022/11/02 11:29:54] ppocr INFO: dataset : [2022/11/02 11:29:54] ppocr INFO: data_dir : ./sealdatasets/data/dataset/ [2022/11/02 11:29:54] ppocr INFO: ext_op_transform_idx : 1 [2022/11/02 11:29:54] ppocr INFO: label_file_list : ['./sealdatasets/data/dataset/train/train.txt'] [2022/11/02 11:29:54] ppocr INFO: name : SimpleDataSet [2022/11/02 11:29:54] ppocr INFO: transforms : [2022/11/02 11:29:54] ppocr INFO: DecodeImage : [2022/11/02 11:29:54] ppocr INFO: channel_first : False [2022/11/02 11:29:54] ppocr INFO: img_mode : BGR [2022/11/02 11:29:54] ppocr INFO: RecConAug : [2022/11/02 11:29:54] ppocr INFO: ext_data_num : 2 [2022/11/02 11:29:54] ppocr INFO: image_shape : [48, 320, 3] [2022/11/02 11:29:54] ppocr INFO: max_text_length : 25 [2022/11/02 11:29:54] ppocr INFO: prob : 0.5 [2022/11/02 11:29:54] ppocr INFO: RecAug : None [2022/11/02 11:29:54] ppocr INFO: MultiLabelEncode : None [2022/11/02 11:29:54] ppocr INFO: RecResizeImg : [2022/11/02 11:29:54] ppocr INFO: image_shape : [3, 48, 320] [2022/11/02 11:29:54] ppocr INFO: KeepKeys : [2022/11/02 11:29:54] ppocr INFO: keep_keys : ['image', 'label_ctc', 'label_sar', 'length', 'valid_ratio'] [2022/11/02 11:29:54] ppocr INFO: loader : [2022/11/02 11:29:54] ppocr INFO: batch_size_per_card : 128 [2022/11/02 11:29:54] ppocr INFO: drop_last : True [2022/11/02 11:29:54] ppocr INFO: num_workers : 4 [2022/11/02 11:29:54] ppocr INFO: shuffle : True [2022/11/02 11:29:54] ppocr INFO: profiler_options : None [2022/11/02 11:29:54] ppocr INFO: train with paddle 2.3.2 and device Place(gpu:0) [2022/11/02 11:29:54] ppocr INFO: Initialize indexs of datasets:['./sealdatasets/data/dataset/train/train.txt'] [2022/11/02 11:29:54] ppocr INFO: Initialize indexs of datasets:['./sealdatasets/data/dataset/test/test.txt'] [2022/11/02 11:29:55] ppocr INFO: train dataloader has 14 iters [2022/11/02 11:29:55] ppocr INFO: valid dataloader has 6 iters [2022/11/02 11:29:55] ppocr INFO: load pretrain successful from ./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy [2022/11/02 11:29:55] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 2000 iterations

phb-shiyige-fw commented 2 years ago

@LDOUBLEV 希望可以帮助一下我，谢谢。

xialei2821212670 commented 2 years ago

我是采用PPOCRv3

phb-shiyige-fw commented 2 years ago

我是采用PPOCRv3

请问参数配置有什么需要注意的地方吗？我训练之后准确率一直为0，我的train.log如下，能帮我看一看嘛？ train.log

xialei2821212670 commented 2 years ago

默认不用修改啥配置，更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

phb-shiyige-fw commented 2 years ago

默认不用修改啥配置，更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

那TPS要加进去吗？transform那里本来是没有内容的我加了 name : STN_ON num_control_points : 20 stn_activation : none tps_inputsize : [32, 64] tps_margins : [0.05, 0.05] tps_outputsize : [32, 100]

phb-shiyige-fw commented 2 years ago

默认不用修改啥配置，更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

还有想请问一下你的训练数据量是多少啊？

phb-shiyige-fw commented 2 years ago

默认不用修改啥配置，更改数据集地址就可。但是训练的比较慢。看样子是数据切分有问题

您好我将数据集重新切分了如下所示，部分图片存在一定的倾斜角度，请问可以直接放入ppocrv3_rec里面识别训练吗？可以问一下您训练的时候数据集的数量是多少吗？ circle_Aug000080 circle_Aug000082 circle_Aug000141 circle_Aug000562

xialei2821212670 commented 2 years ago

不用修改什么配置，更改数据集地址就可以但是比较慢。

您好我将重新切分了以下数据部分，图片显示了某些时候的倾斜角度，请问您可以直接记录一下这些数据集的训练结果吗？

我全部切分出来进行训练的，这看上去切分的不是很干净

phb-shiyige-fw commented 2 years ago

我全部切分出来进行训练的，这看上去切分的不是很干净

怎样才能切分干净啊？我是把14点裁剪修改成4点后得到的裁剪结果，不知道如何进一步优化了。 cut.txt

dengmingD commented 1 year ago

我全部切分出来进行训练的，这看上去切分的不是很干净

怎样才能切分干净啊？我是把14点裁剪修改成4点后得到的裁剪结果，不知道如何进一步优化了。 cut.txt

这个代码写的有问题，需要把mask的定义写到for循环里，像这个样子 for i in range(nsize): mask = np.zeros((height, width), dtype=np.uint8)

sisrfeng commented 1 year ago

类似问题:

doc/doc_ch/recognition.md 把我带到 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml, 用这个配置文件, \ character_dict_path用的是icdar官网下载的chars.txt, \ 在自己仿照icdar2023 ReST(Seal Title)生成出来的1.6万张印章图片上训练, \ acc一直为0

344089386 commented 1 year ago

类似问题:

doc/doc_ch/recognition.md 把我带到 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml, 用这个配置文件, character_dict_path用的是icdar官网下载的chars.txt, 在自己仿照icdar2023 ReST(Seal Title)生成出来的1.6万张印章图片上训练, acc一直为0

我也是 eval的 acc一直是0，不知道怎么回事

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

PaddlePaddle / PaddleOCR

二阶段印章弯曲文本识别训练，acc一直为0，训练是使用PPOCRv3配置文件还是svtrnet_ch的配置文件？ #8191