AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Apache License 2.0
1.38k stars 167 forks source link

LORE wtw模型逻辑位置预测准确率问题 #66

Open kwankwankoo opened 11 months ago

kwankwankoo commented 11 months ago

抱歉占用一下大佬的时间 我在复现 LORE 时,使用了作者提供脚本和权重文件,可以准确预测Cell Location,但是Logical Location和论文所 report 的有较大出入,logical location acc 为 0.23,论文 report 的值为0.86。 图中紫色底是逻辑位置预测值,无色底是ground truth,几乎都对不上。

image

请问一下该如何解决?

BrownXing commented 11 months ago

Can you provide the .sh scripts you run for implementation, in order for us to further investigate what causes the issue.

As for the case you provide, the problem is caused by a mis-detection of recognizing the background as a top-row, which results in a shift (+1) of the column numbers of cells. The mis-detected row should be labeled as (x,x,0,0), but the labels' position is out of the image. To solve this problem, you can further-finetune the model on your dataset to get a more accurate results on your own dataset.

kwankwankoo commented 11 months ago

Can you provide the .sh scripts you run for implementation, in order for us to further investigate what causes the issue.

As for the case you provide, the problem is caused by a mis-detection of recognizing the background as a top-row, which results in a shift (+1) of the column numbers of cells. The mis-detected row should be labeled as (x,x,0,0), but the labels' position is out of the image. To solve this problem, you can further-finetune the model on your dataset to get a more accurate results on your own dataset.

The demo_wired.sh script uses the model /ckpt_wtw/model_best.pth you provided,the test data is the test set of the WTW dataset.

python demo.py ctdet \
        --dataset table \
        --demo ../input_images/wired \
        --demo_name demo_wired \
        --debug 1 \
        --arch dla_34 \
        --K 3000 \
        --MK 5000 \
        --tsfm_layers 3 \
        --stacking_layers 3 \
        --gpus 1\
        --gpu 1\
        --wiz_4ps \
        --wiz_detect \
        --wiz_rev \
        --wiz_stacking \
        --convert_onnx 0 \
        --vis_thresh_corner 0.3 \
        --vis_thresh 0.20 \
        --scores_thresh 0.2 \
        --nms \
        --upper_left\
        --demo_dir ../visualization_wired/ \
        --output_dir ../visualization_wired/ \
        --load_model ../dir_of_ckpt/ckpt_wtw/model_best.pth \
        --load_processor ../dir_of_ckpt/ckpt_wtw/processor_best.pth

In addition, I used the parameters you provided for training, and the accuracy of logical location in WTW's test set was only 0.4.

This is the training script.

python main.py ctdet \
    --dataset_name table \
    --exp_id training_wtw \
    --dataset_name WTW \
    --image_dir ../data/WTW/images \
    --wiz_4ps \
    --wiz_stacking \
    --wiz_pairloss \
    --tsfm_layers 3 \
    --stacking_layers 3 \
    --batch_size 8 \
    --master_batch 2 \
    --arch dla_34 \
    --lr 1e-4 \
    --K 500 \
    --MK 1000 \
    --num_epochs 100 \
    --lr_step '70, 90' \
    --gpus 2, 3, 4, 5\
    --num_workers 16 \
    --val_intervals 10

Hope your reply soon, thx.

kwankwankoo commented 11 months ago

哦对,我用的都是你的数据finetune的,完全跟你来源的代码来跑的,没有做任何改变。 请尽快答复,谢谢!

xiaohualearn commented 10 months ago

哦对,我用的都是你的数据finetune的,完全跟你来源的代码来跑的,没有做任何改变。 请尽快答复,谢谢!

您好,请问你用多块Gpu跑的时候,需要设置哪些代码吗

xiaohualearn commented 10 months ago

请问您解决了这个问题吗

PVToan62 commented 10 months ago

The demo_wired.sh script uses the model /ckpt_wtw/model_best.pth you provided,the test data is the test set of the WTW dataset(https://tianchi.aliyun.com/dataset/108587).

image

Additionally, I changed some parameters like batch_size and gpu for training and the logical placement accuracy in WTW's test set was only 0.55.

This is the training script.

image

Have you fixed this yet?

xiaohualearn commented 10 months ago

Thank you very much for your reply!

I have a new problem. I visualized the WTW marks and found that the logical position of the complex table was biased. There are some very serious deviations. Have you ever run into this problem? Do you think there is something wrong with my dataset? Looking forward to your reply! 非常感谢您的回复! 我有一个新的问题,我对WTW的标注进行可视化,发现复杂表格他的逻辑位置是有偏差的。有一些偏差非常严重 您遇到过这个问题吗?您认为我的数据集是否有问题?期待您的回复

---- 回复的原邮件 ---- | 发件人 | Pham Viet @.> | | 发送日期 | 2023年12月7日 22:44 | | 收件人 | @.> | | 抄送人 | @.> , @.> | | 主题 | Re: [AlibabaResearch/AdvancedLiterateMachinery] LORE wtw模型逻辑位置预测准确率问题 (Issue #66) |

The demo_wired.sh script uses the model /ckpt_wtw/model_best.pth you provided,the test data is the test set of the WTW dataset(https://tianchi.aliyun.com/dataset/108587).

image.png (view on web)

Additionally, I changed some parameters like batch_size and gpu for training and the logical placement accuracy in WTW's test set was only 0.55.

This is the training script.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

HungVu307 commented 7 months ago

Hi all, The GeoLayoutLM model was exported to ONNX successfully. Please check my code via Link.

If it is useful, please give me 1 ⭐ . Whether it raise error when you try to export it yourself, feel free to discuss with me!!

yangy996 commented 6 months ago

Can you provide the .sh scripts you run for implementation, in order for us to further investigate what causes the issue.您能否提供您运行执行的.sh脚本,以便我们进一步调查导致问题的原因。 As for the case you provide, the problem is caused by a mis-detection of recognizing the background as a top-row, which results in a shift (+1) of the column numbers of cells. The mis-detected row should be labeled as (x,x,0,0), but the labels' position is out of the image. To solve this problem, you can further-finetune the model on your dataset to get a more accurate results on your own dataset.至于您提供的情况,问题是由于误检测将背景识别为顶行而导致单元格列数移位(+1)。错误检测的行应标记为 (x,x,0,0),但标签的位置超出图像。为了解决这个问题,您可以在您的数据集上进一步微调模型,以便在您自己的数据集上获得更准确的结果。

The demo_wired.sh script uses the model /ckpt_wtw/model_best.pth you provided,the test data is the test set of the WTW dataset.demo_wired.sh脚本使用您提供的模型/ckpt_wtw/model_best.pth,测试数据为WTW数据集的测试集。

python demo.py ctdet \
        --dataset table \
        --demo ../input_images/wired \
        --demo_name demo_wired \
        --debug 1 \
        --arch dla_34 \
        --K 3000 \
        --MK 5000 \
        --tsfm_layers 3 \
        --stacking_layers 3 \
        --gpus 1\
        --gpu 1\
        --wiz_4ps \
        --wiz_detect \
        --wiz_rev \
        --wiz_stacking \
        --convert_onnx 0 \
        --vis_thresh_corner 0.3 \
        --vis_thresh 0.20 \
        --scores_thresh 0.2 \
        --nms \
        --upper_left\
        --demo_dir ../visualization_wired/ \
        --output_dir ../visualization_wired/ \
        --load_model ../dir_of_ckpt/ckpt_wtw/model_best.pth \
        --load_processor ../dir_of_ckpt/ckpt_wtw/processor_best.pth

In addition, I used the parameters you provided for training, and the accuracy of logical location in WTW's test set was only 0.4.另外,我使用你提供的参数进行训练,WTW的测试集中逻辑位置的准确率只有0.4。

This is the training script.这是训练脚本。

python main.py ctdet \
  --dataset_name table \
  --exp_id training_wtw \
  --dataset_name WTW \
  --image_dir ../data/WTW/images \
  --wiz_4ps \
  --wiz_stacking \
  --wiz_pairloss \
  --tsfm_layers 3 \
  --stacking_layers 3 \
  --batch_size 8 \
  --master_batch 2 \
  --arch dla_34 \
  --lr 1e-4 \
  --K 500 \
  --MK 1000 \
  --num_epochs 100 \
  --lr_step '70, 90' \
  --gpus 2, 3, 4, 5\
  --num_workers 16 \
  --val_intervals 10

Hope your reply soon, thx.希望您尽快回复,谢谢。

大佬,你的准确率0.4是怎么得出来的,我训练的时候没有这个值

yangy996 commented 6 months ago

Can you provide the .sh scripts you run for implementation, in order for us to further investigate what causes the issue.您能否提供您运行执行的.sh脚本,以便我们进一步调查导致问题的原因。 As for the case you provide, the problem is caused by a mis-detection of recognizing the background as a top-row, which results in a shift (+1) of the column numbers of cells. The mis-detected row should be labeled as (x,x,0,0), but the labels' position is out of the image. To solve this problem, you can further-finetune the model on your dataset to get a more accurate results on your own dataset.至于您提供的情况,问题是由于误检测将背景识别为顶行而导致单元格列数移位(+1)。错误检测的行应标记为 (x,x,0,0),但标签的位置超出图像。为了解决这个问题,您可以在您的数据集上进一步微调模型,以便在您自己的数据集上获得更准确的结果。

The demo_wired.sh script uses the model /ckpt_wtw/model_best.pth you provided,the test data is the test set of the WTW dataset.demo_wired.sh脚本使用您提供的模型/ckpt_wtw/model_best.pth,测试数据为WTW数据集的测试集。

python demo.py ctdet \
        --dataset table \
        --demo ../input_images/wired \
        --demo_name demo_wired \
        --debug 1 \
        --arch dla_34 \
        --K 3000 \
        --MK 5000 \
        --tsfm_layers 3 \
        --stacking_layers 3 \
        --gpus 1\
        --gpu 1\
        --wiz_4ps \
        --wiz_detect \
        --wiz_rev \
        --wiz_stacking \
        --convert_onnx 0 \
        --vis_thresh_corner 0.3 \
        --vis_thresh 0.20 \
        --scores_thresh 0.2 \
        --nms \
        --upper_left\
        --demo_dir ../visualization_wired/ \
        --output_dir ../visualization_wired/ \
        --load_model ../dir_of_ckpt/ckpt_wtw/model_best.pth \
        --load_processor ../dir_of_ckpt/ckpt_wtw/processor_best.pth

In addition, I used the parameters you provided for training, and the accuracy of logical location in WTW's test set was only 0.4.另外,我使用你提供的参数进行训练,WTW的测试集中逻辑位置的准确率只有0.4。

This is the training script.这是训练脚本。

python main.py ctdet \
  --dataset_name table \
  --exp_id training_wtw \
  --dataset_name WTW \
  --image_dir ../data/WTW/images \
  --wiz_4ps \
  --wiz_stacking \
  --wiz_pairloss \
  --tsfm_layers 3 \
  --stacking_layers 3 \
  --batch_size 8 \
  --master_batch 2 \
  --arch dla_34 \
  --lr 1e-4 \
  --K 500 \
  --MK 1000 \
  --num_epochs 100 \
  --lr_step '70, 90' \
  --gpus 2, 3, 4, 5\
  --num_workers 16 \
  --val_intervals 10

Hope your reply soon, thx.希望您尽快回复,谢谢。

大佬,解决了吗?我跑的准确率也很低