Open AdBaWa opened 5 days ago
@AdBaWa, can you please try converting your tables with this option: TableFormerMode.ACCURATE
as described here: control-pdf-table-extraction-options
This is to use the version of our TableFormer that has more layers / parameters, and it might catch the nuances.
@maxmnemonic I tried it out, but it didn't catch the nuances. Result:
I see, the header is misaligned with content of the table (text of a header from one column is above the content of another column). Thanks for the input, we have to think if we can introduce some of the distortions like these to the synthetic training data to increase model robustness in the future.
We need to first leverage the word-level bounding box together with the accurate tableformer.
depends on #285
Requested feature
Enhanced table extraction for complex table formats. Currently, Docling is able to identify the values correctly, but formatting is sometimes misaligned or unclear, especially in tables with multi-line headers, merged cells, or specific symbols. This affects readability and usability of the output, particularly when dealing with scientific or technical tables with detailed data.
Examples:
Alternatives