PDF2JSON is a conversion library based on XPDF (3.02) which can be used for high performance PDF page by page conversion to JSON and XML format. It also supports compressing data to minimize size. PDF2JSON is available for Windows, OSX and Linux. Please see https://flowpaper.com for more information
305
stars
52
forks
source link
Problem with content in one line being seperated to multiple ones #46
I am trying to find signatures in a document by checking each line and seeing if it has the code signature however when doing that I noticed it splits content which is in the same into multiple lines for some reason, how can I tackle this?
2020-07-17T09:55:54.404Z adfcfc04-637c-4228-b256-6a5b3214308c INFO Signed%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6.
2020-07-17T09:55:54.404Z adfcfc04-637c-4228-b256-6a5b3214308c INFO SIGN_ABOVE_HERE
2020-07-17T09:55:54.423Z adfcfc04-637c-4228-b256-6a5b3214308c INFO _
2020-07-17T09:55:54.423Z adfcfc04-637c-4228-b256-6a5b3214308c INFO JOHN
2020-07-17T09:55:54.423Z adfcfc04-637c-4228-b256-6a5b3214308c INFO _
2020-07-17T09:55:54.423Z adfcfc04-637c-4228-b256-6a5b3214308c INFO Signed%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6%E2%80%A6.
2020-07-17T09:55:54.423Z adfcfc04-637c-4228-b256-6a5b3214308c INFO SIGN
2020-07-17T09:55:54.423Z adfcfc04-637c-4228-b256-6a5b3214308c INFO _ABOVE_HERE_SUSAN_
I am attaching a doc so u can see how it looks like too
doc.docx
Hi guys,
I am trying to find signatures in a document by checking each line and seeing if it has the code signature however when doing that I noticed it splits content which is in the same into multiple lines for some reason, how can I tackle this?
I am attaching a doc so u can see how it looks like too doc.docx