documentcloud / docsplit

Break Apart Documents into Images, Text, Pages and PDFs
http://documentcloud.github.com/docsplit/
Other
833 stars 214 forks source link

Horizontal / table formatted text #136

Open nofxx opened 8 years ago

nofxx commented 8 years ago

Got some tables inside pdf I really needed to parse (or 100 hours of monkey job) It's impossible without passing -layout option to the pdf parser. This patch introduces the 'pdf_opts' param, and works as expected: https://github.com/documentcloud/docsplit/pull/114

Just found this one too: https://github.com/documentcloud/docsplit/pull/132