Academic-Hammer / SciTSR

Table structure recognition dataset of the paper: Complicated Table Structure Recognition
https://arxiv.org/pdf/1908.04729.pdf
MIT License
350 stars 57 forks source link

(probably) incorrect spanning cell labels #38

Open ejlee95 opened 2 years ago

ejlee95 commented 2 years ago

I recovered the structure of the table using chunk(.chunk) and structure(.json) files. When I visualized the annotation, I found that some annotations are wrong.. (not included in scitsr-comp.list)

The box in the image below indicates a single cell (blue: non-empty, green: empty). I think the 'Original' cell and 'code' cell have to be merged into one cell, but they are separated in the given annotation (.json file).

cell-structure

original image

Is it okay to regard this table as 'simple table'?

cjw94103 commented 2 years ago

Can you please tell me how to recover?

Thank You!

ejlee95 commented 2 years ago

@cjw94103

First, I matched the text chunks in "chunks/.chunk" file with the cells in "structure/.json" file Second, grouped x/y-coordinates of chunks indexed by the same 'start' and 'end' Finally, I got the cell box x/y-coordinates of I-th col/row as the (max end (i-1) + min of start (i))/2.