Closed tridemax closed 6 years ago
Thanks. That should simplify things. But it will take some testing to update this library. Would be glad to have some help.
Not a problem - I’ve tried to fix it right away but had some issues, probably because of loose understanding of what is going on with Unicode differences for Python 2 and 3. But testing and fixing should be a bit easier if you can provide a general direction.
There are a few places that I had to work around weird Latin-1 encoding:
https://github.com/JoshData/pdf-redactor/search?utf8=%E2%9C%93&q=latin&type=
Those parts might be no longer necessary, which would be great.
Other than that, off hand I don't really know. :)
Hey @JoshData
I am facing issue with executing the example.py with the test pdf i.e. /tests/test-ssns.pdf
Error: Traceback (most recent call last): File ".\example.py", line 47, in <module> pdf_redactor.redactor(options) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 110, in redactor text_layer = build_text_layer(document, options) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 488, in build_text_layer prev_token[i] = make_mutable_string_token(prev_token[i]) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 460, in make_mutable_string_token token = TextToken(token.decode(), current_font) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 410, in __init__ self.original_value = toUnicode(value, font, fontcache) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 699, in toUnicode fontcache[font.ToUnicode.stream] = CMap(font.ToUnicode) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 639, in __init__ add_mapping(code_to_int(code), char) File "E:\BankerBay\Python\PDFEdit\pdf-redactor-master\pdf_redactor.py", line 560, in add_mapping code = bytes([code]) ValueError: bytes must be in range(0, 256)
Can you please help me fixing this?
Thanks
Unfortunately I'm not going to have time to look into it for at least a few weeks, sorry.
I've pushed a fix for using pdfrw 0.4. Let me know if it solves your problem!
Thanks for reporting the issue.
As pdfrw now uses Unicode for PdfString (https://github.com/pmaupin/pdfrw/commit/d8a9292ad651dfdfc674f38121198cf1bc10240d) pdf-redactor fails with an error on this new version: