PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
38.99k stars 7.32k forks source link

评估表格结构和cell坐标的时候,html结构预测acc为0.999,为什么检测的box的召回精度等评价指标都是0,排查了一下好像是gt的bbox没有读取到,导致eval的时候评价指标都是0,这个需要怎么修改? #12024

Closed plotnine1219 closed 2 days ago

plotnine1219 commented 2 weeks ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 5.0 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00000

Architecture: model_type: table algorithm: SLANet Backbone: name: PPLCNet scale: 1.0 pretrained: True use_ssld: True Neck: name: CSPPAN out_channels: 96 Head: name: SLAHead hidden_size: 256 max_text_length: *max_text_length loc_reg_num: &loc_reg_num 8

Loss: name: SLALoss structure_weight: 1.0 loc_weight: 2.0 loc_loss: smooth_l1

PostProcess: name: TableLabelDecode merge_no_span_structure: &merge_no_span_structure True

Metric: name: TableMetric main_indicator: acc compute_bbox_metric: True loc_reg_num: loc_reg_num box_format: box_format del_thead_tbody: True

Train: dataset: name: PubTabDataSet data_dir: 500_table/ label_file_list: [500_table/train.txt] transforms:

Eval: dataset: name: PubTabDataSet data_dir: 500_table/ label_file_list: [500_table/val.txt] transforms:

eval结果: [2024/04/29 15:55:18] ppocr INFO: metric eval *** [2024/04/29 15:55:18] ppocr INFO: acc:0.9999990000010001 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_precision:0.0 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_recall:0 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_hmean:0 [2024/04/29 15:55:18] ppocr INFO: fps:1.2853043121990564

UserWangZz commented 2 weeks ago

检查一下标注格式是否正确

plotnine1219 commented 2 weeks ago

检查一下标注格式是否正确

标注格式是没有问题的

UserWangZz commented 1 week ago

你好,可以debug看一下数据读取过程bbox是否正确读取到了

plotnine1219 commented 1 week ago

你好,可以debug看一下数据读取过程bbox是否正确读取到了

您好是正确读取到了,但是eval的时候是将预测的bbox的坐标与bbox_mask进行匹配计算了

UserWangZz commented 1 week ago

问题解决了吗?

plotnine1219 commented 1 week ago

问题解决了吗?

没……

UserWangZz commented 1 week ago

你好可以提供一下执行的命令吗,我排查一下

UserWangZz commented 1 week ago

你的paddle和paddleocr版本是多少呢

plotnine1219 commented 1 week ago

ti

你的paddle和paddleocr版本是多少呢

你好,paddleocr-2.7.4. paddle-2.5.1 config文件 `Global: use_gpu: False epoch_num: 300 log_smooth_window: 20 print_batch_step: 20 save_model_dir: ./output/SLANet_ch/613_no_xuanzhuan_padding_LCPAN save_epoch_step: 400

evaluation is run every 331 iterations after the 0th iteration

eval_batch_step: [0, 331] cal_metric_during_train: True pretrained_model: checkpoints: save_inference_dir: ./output/SLANet_ch/613_no_xuanzhuan/infer/ use_visualdl: False infer_img: ./500_table/

for data or label process

character_dict_path: ppocr/utils/dict/table_structure_dict_ch.txt character_type: en max_text_length: &max_text_length 500 box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' infer_mode: False

use_sync_bn: True

use_sync_bn: False save_res_path: output/infer

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 5.0 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00000

Architecture: model_type: table algorithm: SLANet Backbone: name: PPLCNet scale: 1.0 pretrained: True use_ssld: True Neck: name: LCPAN out_channels: 96 Head: name: SLAHead hidden_size: 256 max_text_length: *max_text_length loc_reg_num: &loc_reg_num 8

Loss: name: SLALoss structure_weight: 1.0 loc_weight: 2.0 loc_loss: smooth_l1

PostProcess: name: TableLabelDecode merge_no_span_structure: &merge_no_span_structure True

Metric: name: TableMetric main_indicator: acc compute_bbox_metric: True loc_reg_num: loc_reg_num box_format: box_format del_thead_tbody: True

Train: dataset: name: PubTabDataSet data_dir: 500_table_no_xuanzhuan label_file_list: [500_table_no_xuanzhuan_padding/train.txt] transforms:

Eval: dataset: name: PubTabDataSet data_dir: 500_table_no_xuanzhuan/ label_file_list: [500_table_no_xuanzhuan_padding/val.txt] transforms:

UserWangZz commented 1 week ago

你好,可以使用tools/infer_table.py推理可视化一下,看看模型输出是否正常,然后我们在检查box的评测哪里出了问题

plotnine1219 commented 1 week ago

你好,可以使用tools/infer_table.py推理可视化一下,看看模型输出是否正常,然后我们在检查box的评测哪里出了问题

infer_table,除了不准没啥问题,就是eval表格box的三个指标有问题,结果都是0

UserWangZz commented 1 week ago

尝试切换分支到2.7版本试试,如果还是不行的话,我这边复现一下看看

plotnine1219 commented 1 week ago

尝试切换分支到2.7版本试试,如果还是不行的话,我这边复现一下看看

刚试了一下2.7也不行

UserWangZz commented 1 week ago

好的 我这边尝试复现一下哈

plotnine1219 commented 6 days ago

好的 我这边尝试复现一下哈 2.7分枝,ppocr/data/imaug/label_ops.py 第 718行# encode box bboxes = np.zeros( (self._max_text_len, self.loc_reg_num), dtype=np.float32) 创建了一个全零二维数组,然后用这个数组去和预测出来的box进行后续的iou和损失的计算,我理解应该是这里有问题

UserWangZz commented 5 days ago

image 你好这边复现的结果是正常的

plotnine1219 commented 5 days ago

好的 我这边尝试复现一下哈

(https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7/ppocr/data/imaug) /label_ops.py,第816行,我的理解是应该是读取cell里面的bbox,

image 你好这边复现的结果是正常的

image 你好,但是我们这个根本都没有训练到box, 这是我的标注信息 {"html": {"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "cells": [{"tokens": [], "bbox": [[[30, 30], [246, 30], [246, 93], [30, 93]]]}, {"tokens": [], "bbox": [[[246, 30], [1096, 30], [1096, 93], [246, 93]]]}, {"tokens": [], "bbox": [[[30, 93], [246, 93], [246, 173], [30, 173]]]}, {"tokens": [], "bbox": [[[246, 93], [1096, 93], [1096, 173], [246, 173]]]}, {"tokens": [], "bbox": [[[30, 173], [246, 173], [246, 295], [30, 295]]]}, {"tokens": [], "bbox": [[[246, 173], [1096, 173], [1096, 295], [246, 295]]]}, {"tokens": [], "bbox": [[[30, 295], [246, 295], [246, 376], [30, 376]]]}, {"tokens": [], "bbox": [[[246, 295], [1096, 295], [1096, 376], [246, 376]]]}, {"tokens": [], "bbox": [[[30, 376], [246, 376], [246, 459], [30, 459]]]}, {"tokens": [], "bbox": [[[246, 376], [1096, 376], [1096, 459], [246, 459]]]}]}, "filename": "(已压缩)AHLY〔2022〕086号 龙源电力安徽来安三湾风电项目风电机组设备采购合同-工程建设部-李纳-2022.8.23(2).pdf180.png_table_1.png"}

plotnine1219 commented 5 days ago

好的 我这边尝试复现一下哈

(https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7/ppocr/data/imaug) /label_ops.py,第816行,我的理解是应该是读取cell里面的bbox,

image 你好这边复现的结果是正常的

image 你好,但是我们这个根本都没有训练到box, 这是我的标注信息 {"html": {"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "cells": [{"tokens": [], "bbox": [[[30, 30], [246, 30], [246, 93], [30, 93]]]}, {"tokens": [], "bbox": [[[246, 30], [1096, 30], [1096, 93], [246, 93]]]}, {"tokens": [], "bbox": [[[30, 93], [246, 93], [246, 173], [30, 173]]]}, {"tokens": [], "bbox": [[[246, 93], [1096, 93], [1096, 173], [246, 173]]]}, {"tokens": [], "bbox": [[[30, 173], [246, 173], [246, 295], [30, 295]]]}, {"tokens": [], "bbox": [[[246, 173], [1096, 173], [1096, 295], [246, 295]]]}, {"tokens": [], "bbox": [[[30, 295], [246, 295], [246, 376], [30, 376]]]}, {"tokens": [], "bbox": [[[246, 295], [1096, 295], [1096, 376], [246, 376]]]}, {"tokens": [], "bbox": [[[30, 376], [246, 376], [246, 459], [30, 459]]]}, {"tokens": [], "bbox": [[[246, 376], [1096, 376], [1096, 459], [246, 459]]]}]}, "filename": "(已压缩)AHLY〔2022〕086号 龙源电力安徽来安三湾风电项目风电机组设备采购合同-工程建设部-李纳-2022.8.23(2).pdf180.png_table_1.png"}

structure里的token可能有显示问题 ![Uploading image.png…]()

UserWangZz commented 5 days ago

box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' 感觉是这里的问题, box_format: 'xyxyxyxy' 这样子试试

plotnine1219 commented 5 days ago

box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy'

也不对……

UserWangZz commented 5 days ago

box_format: &box_format 'xyxyxyxy' # 'xywh', 'xyxy', 'xyxyxyxy' 是不是和没带引号有关?

plotnine1219 commented 4 days ago

box_format: &box_format 'xyxyxyxy' # 'xywh', 'xyxy', 'xyxyxyxy' 是不是和没带引号有关?

![Uploading image.png…]() ops这里读取到了格式

UserWangZz commented 4 days ago

That's fine❤️❤️❤️

plotnine1219 commented 4 days ago

That's fine❤️❤️❤️

没没没,还没解决,刚才是不带引号也能读取到格式

plotnine1219 commented 3 days ago

That's fine❤️❤️❤️

应该是找到问题所在了,因为我的数据集的全部都是没有文字的空表格,在data/imaug/label_ops.py的730行, if 'bbox' in cells[bbox_idx] and len(cells[bbox_idx]['tokens']) == 0: bbox = cells[bbox_idx]['bbox'].copy() bbox = np.array(bbox, dtype=np.float32).reshape(-1) bboxes[i] = bbox bbox_masks[i] = 1.0 不能进入这个if判断,就导致dataloader读取不到这个bbox的位置

UserWangZz commented 3 days ago

好的