JonathanLink / PDFLayoutTextStripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
https://jonathanlink.ch/PDFLayoutTextStripper.html
Apache License 2.0
1.57k stars 208 forks source link

difference with pdftotext -layout? #2

Closed martinszy closed 7 years ago

martinszy commented 7 years ago

What is the difference between this library and the pdftotext command?

JonathanLink commented 7 years ago

Not a lot, during the development I often compared my results with the output of the pdf2text command. The main motivation behind the project is to have an equivalent java code easier to embedded in your java desktop apps or android apps.