BobLd / tabula-sharp

Extract tables from PDF files (port of tabula-java)
MIT License
159 stars 26 forks source link

Fix the last cell not being properly picked by SpreadsheetExtractionAlgorithm #32

Closed MadderThenMad closed 7 months ago

MadderThenMad commented 7 months ago

https://github.com/BobLd/tabula-sharp/issues/25 - PR for the Issue mention in this issue.

We can pick the contents of SpreadsheetExtractionAlgorithm horizontalR collection line 116 and see that the first boundry is the only one with X coordinates in reverse order.

That bottom boundary of the page was inserted with reversed X coordinates. This has caused the SpreadsheetExtractionAlgorithm to fail to properly pick last line cells on some of the PDFs.