plotnine1219 commented 2 weeks ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：
版本号/Version：Paddle： PaddleOCR：问题相关组件/Related components：
运行指令/Command Code：
完整报错/Complete Error Message：
我的config文件： Global: use_gpu: False epoch_num: 10 log_smooth_window: 20 print_batch_step: 20 save_model_dir: /Users/pengkang01/Desktop/txt转matrix/PaddleOCR/output/SLANet_ch save_epoch_step: 400
evaluation is run every 331 iterations after the 0th iteration

eval_batch_step: [0, 331] cal_metric_during_train: True pretrained_model: checkpoints: save_inference_dir: ./output/SLANet_ch/infer use_visualdl: False infer_img: /Users/pengkang01/Desktop/txt转matrix/PaddleOCR/500_table

for data or label process

character_dict_path: /Users/pengkang01/Desktop/txt转matrix/PaddleOCR/ppocr/utils/dict/table_structure_dict_ch.txt character_type: en max_text_length: &max_text_length 500 box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' infer_mode: False

use_sync_bn: True

use_sync_bn: False save_res_path: output/infer

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 5.0 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00000

Architecture: model_type: table algorithm: SLANet Backbone: name: PPLCNet scale: 1.0 pretrained: True use_ssld: True Neck: name: CSPPAN out_channels: 96 Head: name: SLAHead hidden_size: 256 max_text_length: *max_text_length loc_reg_num: &loc_reg_num 8

Loss: name: SLALoss structure_weight: 1.0 loc_weight: 2.0 loc_loss: smooth_l1

PostProcess: name: TableLabelDecode merge_no_span_structure: &merge_no_span_structure True

Metric: name: TableMetric main_indicator: acc compute_bbox_metric: True loc_reg_num: loc_reg_num box_format: box_format del_thead_tbody: True

Train: dataset: name: PubTabDataSet data_dir: 500_table/ label_file_list: [500_table/train.txt] transforms:

DecodeImage: img_mode: BGR channel_first: False
TableLabelEncode: learn_empty_box: True merge_no_span_structure: merge_no_span_structure replace_empty_cell_token: False loc_reg_num: loc_reg_num max_text_length: *max_text_length
TableBoxEncode: in_box_format: box_format out_box_format: box_format
ResizeTableImage: max_len: 488
NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: 'hwc'
PaddingTableImage: size: [488, 488]
ToCHWImage:
KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: True batch_size_per_card: 1
batch_size_per_card: 48

drop_last: True

num_workers: 1

num_workers: 0

Eval: dataset: name: PubTabDataSet data_dir: 500_table/ label_file_list: [500_table/val.txt] transforms:

DecodeImage: img_mode: BGR channel_first: False
TableLabelEncode: learn_empty_box: True merge_no_span_structure: merge_no_span_structure replace_empty_cell_token: False loc_reg_num: loc_reg_num max_text_length: *max_text_length
TableBoxEncode: in_box_format: box_format out_box_format: box_format
ResizeTableImage: max_len: 488
NormalizeImage: scale: 1./255. mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: 'hwc'
PaddingTableImage: size: [488, 488]
ToCHWImage:
KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: False drop_last: False
batch_size_per_card: 48

num_workers: 1

batch_size_per_card: 1 num_workers: 0

eval结果： [2024/04/29 15:55:18] ppocr INFO: metric eval *** [2024/04/29 15:55:18] ppocr INFO: acc:0.9999990000010001 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_precision:0.0 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_recall:0 [2024/04/29 15:55:18] ppocr INFO: bbox_metric_hmean:0 [2024/04/29 15:55:18] ppocr INFO: fps:1.2853043121990564

UserWangZz commented 2 weeks ago

检查一下标注格式是否正确

plotnine1219 commented 2 weeks ago

检查一下标注格式是否正确

标注格式是没有问题的

UserWangZz commented 1 week ago

你好，可以debug看一下数据读取过程bbox是否正确读取到了

plotnine1219 commented 1 week ago

你好，可以debug看一下数据读取过程bbox是否正确读取到了

您好是正确读取到了，但是eval的时候是将预测的bbox的坐标与bbox_mask进行匹配计算了

UserWangZz commented 1 week ago

问题解决了吗？

plotnine1219 commented 1 week ago

问题解决了吗？

没……

UserWangZz commented 1 week ago

你好可以提供一下执行的命令吗，我排查一下

UserWangZz commented 1 week ago

你的paddle和paddleocr版本是多少呢

plotnine1219 commented 1 week ago

ti

你的paddle和paddleocr版本是多少呢

你好，paddleocr-2.7.4. paddle-2.5.1 config文件 `Global: use_gpu: False epoch_num: 300 log_smooth_window: 20 print_batch_step: 20 save_model_dir: ./output/SLANet_ch/613_no_xuanzhuan_padding_LCPAN save_epoch_step: 400

evaluation is run every 331 iterations after the 0th iteration

eval_batch_step: [0, 331] cal_metric_during_train: True pretrained_model: checkpoints: save_inference_dir: ./output/SLANet_ch/613_no_xuanzhuan/infer/ use_visualdl: False infer_img: ./500_table/

for data or label process

character_dict_path: ppocr/utils/dict/table_structure_dict_ch.txt character_type: en max_text_length: &max_text_length 500 box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' infer_mode: False

use_sync_bn: True

use_sync_bn: False save_res_path: output/infer

Optimizer: name: Adam beta1: 0.9 beta2: 0.999 clip_norm: 5.0 lr: learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00000

Architecture: model_type: table algorithm: SLANet Backbone: name: PPLCNet scale: 1.0 pretrained: True use_ssld: True Neck: name: LCPAN out_channels: 96 Head: name: SLAHead hidden_size: 256 max_text_length: *max_text_length loc_reg_num: &loc_reg_num 8

Loss: name: SLALoss structure_weight: 1.0 loc_weight: 2.0 loc_loss: smooth_l1

PostProcess: name: TableLabelDecode merge_no_span_structure: &merge_no_span_structure True

Metric: name: TableMetric main_indicator: acc compute_bbox_metric: True loc_reg_num: loc_reg_num box_format: box_format del_thead_tbody: True

Train: dataset: name: PubTabDataSet data_dir: 500_table_no_xuanzhuan label_file_list: [500_table_no_xuanzhuan_padding/train.txt] transforms:

DecodeImage: img_mode: BGR channel_first: False
TableLabelEncode: learn_empty_box: True merge_no_span_structure: merge_no_span_structure replace_empty_cell_token: False loc_reg_num: loc_reg_num max_text_length: *max_text_length
TableBoxEncode: in_box_format: box_format out_box_format: box_format
ResizeTableImage: max_len: 488
NormalizeImage: scale: 1./255. mean: [0.93135516, 0.93246497, 0.93411841] #[0.485, 0.456, 0.406] std: [0.1713343, 0.17117019, 0.17039258] #[0.229, 0.224, 0.225] order: 'hwc'
PaddingTableImage: size: [488, 488]
ToCHWImage:
KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: True batch_size_per_card: 4
batch_size_per_card: 48

drop_last: True

num_workers: 1

num_workers: 0

Eval: dataset: name: PubTabDataSet data_dir: 500_table_no_xuanzhuan/ label_file_list: [500_table_no_xuanzhuan_padding/val.txt] transforms:

DecodeImage: img_mode: BGR channel_first: False
TableLabelEncode: learn_empty_box: True merge_no_span_structure: merge_no_span_structure replace_empty_cell_token: False loc_reg_num: loc_reg_num max_text_length: *max_text_length
TableBoxEncode: in_box_format: box_format out_box_format: box_format
ResizeTableImage: max_len: 488
NormalizeImage: scale: 1./255. mean: [0.93135516, 0.93246497, 0.93411841] #[0.485, 0.456, 0.406] std: [0.1713343, 0.17117019, 0.17039258] #[0.229, 0.224, 0.225] order: 'hwc'
PaddingTableImage: size: [488, 488]
ToCHWImage:
KeepKeys: keep_keys: [ 'image', 'structure', 'bboxes', 'bbox_masks', 'shape' ] loader: shuffle: False drop_last: False
batch_size_per_card: 48

num_workers: 1

batch_size_per_card: 4 num_workers: 0 `

UserWangZz commented 1 week ago

你好，可以使用tools/infer_table.py推理可视化一下，看看模型输出是否正常，然后我们在检查box的评测哪里出了问题

plotnine1219 commented 1 week ago

你好，可以使用tools/infer_table.py推理可视化一下，看看模型输出是否正常，然后我们在检查box的评测哪里出了问题

infer_table,除了不准没啥问题，就是eval表格box的三个指标有问题，结果都是0

UserWangZz commented 1 week ago

尝试切换分支到2.7版本试试，如果还是不行的话，我这边复现一下看看

plotnine1219 commented 1 week ago

尝试切换分支到2.7版本试试，如果还是不行的话，我这边复现一下看看

刚试了一下2.7也不行

UserWangZz commented 1 week ago

好的我这边尝试复现一下哈

plotnine1219 commented 6 days ago

好的我这边尝试复现一下哈 2.7分枝，ppocr/data/imaug/label_ops.py 第 718行# encode box bboxes = np.zeros( (self._max_text_len, self.loc_reg_num), dtype=np.float32) 创建了一个全零二维数组，然后用这个数组去和预测出来的box进行后续的iou和损失的计算，我理解应该是这里有问题

UserWangZz commented 5 days ago

你好这边复现的结果是正常的

plotnine1219 commented 5 days ago

好的我这边尝试复现一下哈

(https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7/ppocr/data/imaug) /label_ops.py，第816行，我的理解是应该是读取cell里面的bbox，

你好这边复现的结果是正常的

你好，但是我们这个根本都没有训练到box，这是我的标注信息 {"html": {"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "cells": [{"tokens": [], "bbox": [[[30, 30], [246, 30], [246, 93], [30, 93]]]}, {"tokens": [], "bbox": [[[246, 30], [1096, 30], [1096, 93], [246, 93]]]}, {"tokens": [], "bbox": [[[30, 93], [246, 93], [246, 173], [30, 173]]]}, {"tokens": [], "bbox": [[[246, 93], [1096, 93], [1096, 173], [246, 173]]]}, {"tokens": [], "bbox": [[[30, 173], [246, 173], [246, 295], [30, 295]]]}, {"tokens": [], "bbox": [[[246, 173], [1096, 173], [1096, 295], [246, 295]]]}, {"tokens": [], "bbox": [[[30, 295], [246, 295], [246, 376], [30, 376]]]}, {"tokens": [], "bbox": [[[246, 295], [1096, 295], [1096, 376], [246, 376]]]}, {"tokens": [], "bbox": [[[30, 376], [246, 376], [246, 459], [30, 459]]]}, {"tokens": [], "bbox": [[[246, 376], [1096, 376], [1096, 459], [246, 459]]]}]}, "filename": "（已压缩）AHLY〔2022〕086号龙源电力安徽来安三湾风电项目风电机组设备采购合同－工程建设部－李纳－2022.8.23(2).pdf180.png_table_1.png"}

plotnine1219 commented 5 days ago

好的我这边尝试复现一下哈

(https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.7/ppocr/data/imaug) /label_ops.py，第816行，我的理解是应该是读取cell里面的bbox，

你好这边复现的结果是正常的

你好，但是我们这个根本都没有训练到box，这是我的标注信息 {"html": {"structure": {"tokens": ["", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""]}, "cells": [{"tokens": [], "bbox": [[[30, 30], [246, 30], [246, 93], [30, 93]]]}, {"tokens": [], "bbox": [[[246, 30], [1096, 30], [1096, 93], [246, 93]]]}, {"tokens": [], "bbox": [[[30, 93], [246, 93], [246, 173], [30, 173]]]}, {"tokens": [], "bbox": [[[246, 93], [1096, 93], [1096, 173], [246, 173]]]}, {"tokens": [], "bbox": [[[30, 173], [246, 173], [246, 295], [30, 295]]]}, {"tokens": [], "bbox": [[[246, 173], [1096, 173], [1096, 295], [246, 295]]]}, {"tokens": [], "bbox": [[[30, 295], [246, 295], [246, 376], [30, 376]]]}, {"tokens": [], "bbox": [[[246, 295], [1096, 295], [1096, 376], [246, 376]]]}, {"tokens": [], "bbox": [[[30, 376], [246, 376], [246, 459], [30, 459]]]}, {"tokens": [], "bbox": [[[246, 376], [1096, 376], [1096, 459], [246, 459]]]}]}, "filename": "（已压缩）AHLY〔2022〕086号龙源电力安徽来安三湾风电项目风电机组设备采购合同－工程建设部－李纳－2022.8.23(2).pdf180.png_table_1.png"}

structure里的token可能有显示问题 ![Uploading image.png…]()

UserWangZz commented 5 days ago

box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy' 感觉是这里的问题， box_format: 'xyxyxyxy' 这样子试试

plotnine1219 commented 5 days ago

box_format: &box_format xyxyxyxy # 'xywh', 'xyxy', 'xyxyxyxy'

也不对……

UserWangZz commented 5 days ago

box_format: &box_format 'xyxyxyxy' # 'xywh', 'xyxy', 'xyxyxyxy' 是不是和没带引号有关？

plotnine1219 commented 4 days ago

box_format: &box_format 'xyxyxyxy' # 'xywh', 'xyxy', 'xyxyxyxy' 是不是和没带引号有关？

![Uploading image.png…]() ops这里读取到了格式

UserWangZz commented 4 days ago

That's fine❤️❤️❤️

plotnine1219 commented 4 days ago

That's fine❤️❤️❤️

没没没，还没解决，刚才是不带引号也能读取到格式

plotnine1219 commented 3 days ago

That's fine❤️❤️❤️

应该是找到问题所在了，因为我的数据集的全部都是没有文字的空表格，在data/imaug/label_ops.py的730行， if 'bbox' in cells[bbox_idx] and len(cells[bbox_idx]['tokens']) == 0: bbox = cells[bbox_idx]['bbox'].copy() bbox = np.array(bbox, dtype=np.float32).reshape(-1) bboxes[i] = bbox bbox_masks[i] = 1.0 不能进入这个if判断，就导致dataloader读取不到这个bbox的位置

UserWangZz commented 3 days ago

好的

PaddlePaddle / PaddleOCR

评估表格结构和cell坐标的时候，html结构预测acc为0.999，为什么检测的box的召回精度等评价指标都是0，排查了一下好像是gt的bbox没有读取到，导致eval的时候评价指标都是0，这个需要怎么修改？ #12024

evaluation is run every 331 iterations after the 0th iteration

for data or label process

use_sync_bn: True

batch_size_per_card: 48

num_workers: 1

batch_size_per_card: 48

num_workers: 1

evaluation is run every 331 iterations after the 0th iteration

for data or label process

use_sync_bn: True

batch_size_per_card: 48

num_workers: 1

batch_size_per_card: 48

num_workers: 1