doc-analysis / TableBank

TableBank: A Benchmark Dataset for Table Detection and Recognition
Apache License 2.0
987 stars 139 forks source link

Table Detection data mismatch in Word subset #42

Open vm7608 opened 2 months ago

vm7608 commented 2 months ago

I have downloaded and checked the TableBank dataset from your dataset homepage

I have found some issues in the annotations, the README denotes the number of tables in the Table Detection task as follows:

Task Word Latex Word+Latex
Table detection 163,417 253,817 417,234

But I ran my script to check the data annotations, it showed that there were only 101889 tables in the Word subset.