SuleyNL / Extractable

Table extraction library
MIT License
19 stars 2 forks source link

Splitted table captions #15

Open MrUnknown789556 opened 1 year ago

MrUnknown789556 commented 1 year ago

I installed the pacakage and run it.

It generated all tables from the article smoothly. Very nice and impressive.

But I would here point out a minor deficiency:

If the table caption for a table is splitted over more than 1 line, then only one (the lowest splitted part of the caption text) is included in the extracted table.

With the attached test article, it is also seen, that a "table" is extracted, that is not a table at all.

Best regards

Frank

The.pdf The_table_12 1 The_table_8 1

SuleyNL commented 11 months ago

Hi, @MrUnknown789556, Thanks for trying out Extractable. I take your feedback seriously and will work on it. Keep in mind Extractable is a work in progress, so you can contribute by leaving valuable feedback and by making changes in the code.

as for your points:

  1. If the table caption for a table is splitted over more than 1 line, then only one (the lowest splitted part of the caption text) is included in the extracted table.

    • [ ] #TODO I will be looking into this, there should be a solution to it.
  2. With the attached test article, it is also seen, that a "table" is extracted, that is not a table at all. This is an unfortunate byproduct of Extractable's ability to recognize tables with no lines present. It is a double edged sword because sometimes there is tables with no lines that we do want to be detected. But in cases like this it is fooled by just text in a table-like format.