doc-analysis / TableBank

TableBank: A Benchmark Dataset for Table Detection and Recognition
Apache License 2.0
987 stars 139 forks source link

some table not labeled #9

Closed rockyzhengwu closed 4 years ago

rockyzhengwu commented 5 years ago

I found there is some problem in the data , table not labeled . two example from Word.json

{'category_id': 1, 'area': 46280, 'iscrowd': 0, 'segmentation': [[71, 176, 71, 280, 516, 280, 516, 176]], 'id': 69303, 'image_id': 53565, 'bbox': [71, 176, 445, 104]}
{'category_id': 1, 'area': 143613, 'iscrowd': 0, 'segmentation': [[66, 72, 66, 269, 795, 269, 795, 72]], 'id': 67935, 'image_id': 52492, 'bbox': [66, 72, 729, 197]}

53565

52492

lumiaomiao commented 4 years ago

Hi, how did you download the dataset? I have received the download link from author, but cannot download the dataset by "wget" and "browser". Could you introduce you means?

rockyzhengwu commented 4 years ago

@lumiaomiao sorry, I can't remember about that

doc-analysis commented 4 years ago

@rockyzhengwu The method we used can not guarantee that all tables are marked. Table will be detected and labeled automatically by code, which means some error may cause a little table unlabeled.

We randomly sample 1,000 examples from the dataset and manually check the bounding boxes of tables. We observe that only 5 of them are incorrectly labeled, which demonstrates the high quality of this dataset.

For detail, see our paper.

rockyzhengwu commented 4 years ago

Thanks