Open einnairo opened 5 years ago
I think it's just a problem in your regular expression. .
is a special character that matches any character, so it is matching digits too --- meaning, the whole pattern is matching 4 digits in addition to 1 + .
+ 2 digits. To match the decimal point escape the dot with \\.
. (In a regular string, you need two backslashes because you want one actual backslash but backslashes are special in Python strings...)
Thanks for your reply.
I tried both:
(re.compile(u"[0-9]*(\\.)[0-9]{2}"), lambda m : ""),
(re.compile(u"[0-9]*(\.)[0-9]{2}"), lambda m : ""),
Both did not solve the problem as likewise this removes whole numbers.
Can I send you a couple of the pdf examples that I am using, as I do not want to share POs openly.
Sorry, I don't really have time to debug it with you.
Having trouble with blanking out costs with format 12.00 or 12345.98 or 123.76 The problem is it blanks out whole numbers in pdfs too although not all whole numbers which makes it really weird to me.
What I suspect is if pdfs "encode" whole numbers with decimals too? Meaning something displayed in a pdf as 12 for example is actually 12.00. Below is the code which is from example.py and i run it in console.
red.py:
;encoding=utf-8
from pdf_redactor import redactor, RedactorOptions import re
set options.
redactor_options = RedactorOptions()
redactor_options.content_filters = [ (re.compile(u"Cost Price"), lambda m : ""), (re.compile(u"Cost"), lambda m : ""), (re.compile(u"[0-9](.)[0-9]{2}"), lambda m : ""), #this is my regex for costs with 2 decimals (re.compile(u"Value Price"),lambda m : ""), ] redactor_options.content_replacement_glyphs = ['#', '', '/', '-'] redactor(redactor_options)
python3 red.py < a.pdf > anew.pdf does not work for me.
Would appreciate if anyone can help.