DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
9.56k stars 453 forks source link

cli and PDF: wrong table output #268

Open aborruso opened 1 week ago

aborruso commented 1 week ago

Bug

One row in the input table is mixed with another in the output.

The input

image

The output

image

Steps to reproduce

Using this file and run

docling --no-ocr --table-mode accurate table.pdf

Docling version

Docling version: 2.4.0
Docling Core version: 2.3.1
Docling IBM Models version: 2.0.3
Docling Parse version: 2.0.2

Python version

Python 3.11.2

PeterStaar-IBM commented 6 days ago

We need to first leverage the word-level bounding box together with the accurate tableformer.

depends on https://github.com/DS4SD/docling/issues/285