PFCCLab / PPOCRLabel

PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PP-OCR model to automatically detect and re-recognize data.
109 stars 33 forks source link

表格标注生成的gt.txt文件里面的gt属性值html页面表格样式不匹配的bug #57

Closed freezehe closed 3 months ago

freezehe commented 3 months ago

各位老师好,我在标注表格的时候发现一个问题麻烦帮忙解答一下。我标注的是一个简单的表格,在学习完了表格标注培训视频,参考着标注,顺序也调整正确,生成的gt.txt里面的gt的属性值,我复制粘贴到一个txt文本转为html页面查看的时候,发现表格没有对应上样式。我贴一下我上述涉及到的图片和标注文件gt.txt gt.txt image image 我想问一下,这个是不是一个bug?因为gt的属性值对应的是html页面,没有跟表格样式对应上。

GreatV commented 3 months ago

原图提供一下

freezehe commented 3 months ago

原图 ![Uploading 2.jpg…]()

freezehe commented 3 months ago

原图提供一下

image

freezehe commented 3 months ago

我的版本: image

GreatV commented 3 months ago
image
{"filename": "361634212-9a6fcc89-ca58-485a-835f-dce24b678e8d.png", "html": {"structure": {"tokens": ["<tbody>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td", " rowspan=\"6\"", ">", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "<tr>", "<td>", "</td>", "<td>", "</td>", "<td>", "</td>", "</tr>", "</tbody>"]}, "cells": [{"tokens": ["实", "际", "压", "力", "值", "(", "b", "a", "r", ")"], "bbox": [[162, 54], [441, 54], [441, 92], [162, 92]]}, {"tokens": ["被", "校", "仪", "器", "示", "值", "(", "b", "a", "r", ")"], "bbox": [[625, 44], [943, 44], [943, 92], [625, 92]]}, {"tokens": ["测", "量", "不", "确", "定", "度", "(", "k", "=", "2", ")"], "bbox": [[1100, 56], [1424, 56], [1424, 92], [1100, 92]]}, {"tokens": ["-", "1", ".", "0", "0", "0"], "bbox": [[260, 103], [360, 103], [360, 142], [260, 142]]}, {"tokens": ["-", "0", ".", "9", "9", "1"], "bbox": [[740, 103], [838, 103], [838, 142], [740, 142]]}, {"tokens": [], "bbox": [[1168, 101], [1347, 101], [1347, 142], [1168, 142]]}, {"tokens": ["0", ".", "0", "0", "0"], "bbox": [[264, 151], [352, 151], [352, 190], [264, 190]]}, {"tokens": ["0", ".", "0", "1", "1"], "bbox": [[747, 153], [837, 153], [837, 192], [747, 192]]}, {"tokens": [], "bbox": [[1204, 150], [1293, 150], [1293, 189], [1204, 189]]}, {"tokens": ["1", ".", "0", "0", "0"], "bbox": [[267, 196], [359, 196], [359, 235], [267, 235]]}, {"tokens": ["1", ".", "0", "1", "0"], "bbox": [[756, 199], [844, 199], [844, 238], [756, 238]]}, {"tokens": [], "bbox": [[1165, 194], [1332, 194], [1332, 232], [1165, 232]]}, {"tokens": ["2", ".", "0", "0", "0"], "bbox": [[266, 252], [356, 252], [356, 291], [266, 291]]}, {"tokens": ["2", ".", "0", "1", "1"], "bbox": [[745, 252], [835, 252], [835, 290], [745, 290]]}, {"tokens": ["U", "r", "e", "l", "=", "0", ".", "0", "7", "%"], "bbox": [[1189, 250], [1366, 250], [1366, 285], [1189, 285]]}, {"tokens": ["3", ".", "0", "0", "0"], "bbox": [[264, 301], [354, 301], [354, 339], [264, 339]]}, {"tokens": ["3", ".", "0", "1", "2"], "bbox": [[752, 304], [840, 304], [840, 343], [752, 343]]}, {"tokens": [], "bbox": [[1224, 294], [1314, 294], [1314, 331], [1224, 331]]}, {"tokens": ["4", ".", "0", "0", "0"], "bbox": [[263, 346], [359, 346], [359, 387], [263, 387]]}, {"tokens": ["4", ".", "0", "1", "2"], "bbox": [[741, 346], [843, 346], [843, 385], [741, 385]]}, {"tokens": [], "bbox": [[1186, 344], [1353, 344], [1353, 382], [1186, 382]]}, {"tokens": ["5", ".", "0", "0", "0"], "bbox": [[259, 398], [360, 398], [360, 439], [259, 439]]}, {"tokens": ["5", ".", "0", "1", "2"], "bbox": [[741, 398], [842, 398], [842, 435], [741, 435]]}, {"tokens": [], "bbox": [[1192, 393], [1359, 393], [1359, 431], [1192, 431]]}]}, "gt": "<html><body><table><tbody><tr><td></td><td></td><td></td></tr><tr><td>实际压力值(bar)</td><td>被校仪器示值(bar)</td><td rowspan=\"6\">测量不确定度(k=2)</td></tr><tr><td>-1.000</td><td>-0.991</td><td></td></tr><tr><td>0.000</td><td>0.011</td><td></td></tr><tr><td>1.000</td><td>1.010</td><td></td></tr><tr><td>2.000</td><td>2.011</td><td>Urel=0.07%</td></tr><tr><td>3.000</td><td>3.012</td><td></td></tr><tr><td>4.000</td><td>4.012</td><td></td></tr><tr><td>5.000</td><td>5.012</td><td></td></tr></tbody></table></body></html>"}
<html>

<body>
    <table>
        <tbody>
            <tr>
                <td></td>
                <td></td>
                <td></td>
            </tr>
            <tr>
                <td>实际压力值(bar)</td>
                <td>被校仪器示值(bar)</td>
                <td rowspan=\"6\">测量不确定度(k=2)</td>
            </tr>
            <tr>
                <td>-1.000</td>
                <td>-0.991</td>
                <td></td>
            </tr>
            <tr>
                <td>0.000</td>
                <td>0.011</td>
                <td></td>
            </tr>
            <tr>
                <td>1.000</td>
                <td>1.010</td>
                <td></td>
            </tr>
            <tr>
                <td>2.000</td>
                <td>2.011</td>
                <td>Urel=0.07%</td>
            </tr>
            <tr>
                <td>3.000</td>
                <td>3.012</td>
                <td></td>
            </tr>
            <tr>
                <td>4.000</td>
                <td>4.012</td>
                <td></td>
            </tr>
            <tr>
                <td>5.000</td>
                <td>5.012</td>
                <td></td>
            </tr>
        </tbody>
    </table>
</body>

</html>
image

有合并单元格的标注比较麻烦,需要调整编号,空格也要标注出来。

GreatV commented 3 months ago

这个后面我看看怎么优化一下,同时欢迎大家提交PR优化用户体验。