Closed dmh43 closed 10 months ago
Looks like this will improve things but wondering if we could take advantage of PDFPlumber's line segmentation (rows
) downstream to decide how to draw boxes
Just realized a version bump is needed here, I'm setting it to 0.9.11 in my PR so you can take 0.9.12
This PR groups boxes into lines without assuming a perfect match of
box.t
. We use the 3rd decimal point which seems small enough but also big enough to catch most cases.This PR also merges adjacent boxes belonging to a mention which might have 2 spans that are far apart. Instead of not doing anything in that case, it only merges boxes with associated spans that are close.