JoshData / pdf-redactor

A general purpose PDF text-layer redaction tool for Python 2/3.
Creative Commons Zero v1.0 Universal
180 stars 61 forks source link

key-value pattern regex to identify key-value pairs #34

Open sandeep-tukaram opened 6 months ago

sandeep-tukaram commented 6 months ago

Had posted it on stack long time back... not sure if still relevant. Opening issue anyways.

https://stackoverflow.com/questions/62467452/in-a-pdf-want-to-identify-key-value-pairs-and-programatically-redact-values-onl

The question in case the link is not accessible...

I'm using joshdata redactor and key-value pattern regex to identify key-value pairs in a pdf text token stream. However, with some of the pdfs, thought the key-value pair appear adjacent when visually in a pdf document, in its token stream they appear far apart. Hence, doesn't get a match with key-value pattern.

How can this be solved?