PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux. Please see https://flowpaper.com for more information
305
stars
52
forks
source link
Passages with spaces joined by periods rather than split into separate words #43
We have noticed an issue where in somes cases pieces of our text are joined by periods into one massive word, rather than split by spaces into individual array members, eg:
We have noticed an issue where in somes cases pieces of our text are joined by periods into one massive word, rather than split by spaces into individual array members, eg:
Not sure if you have any idea what might cause this – if it’s an issue in our PDFs or something that pdf2json is getting wrong for some reason?