JoshData / pdf-redactor

A general purpose PDF text-layer redaction tool for Python 2/3.
Creative Commons Zero v1.0 Universal
184 stars 61 forks source link

Crashes on large files #23

Open tkzv opened 4 years ago

tkzv commented 4 years ago

No errors for smaller files, but crashes for http://www.stolyarov.info/books/pdf/progintro_vol1.pdf

$ ./smoketest.py progintro_vol1.pdf
./smoketest.py:69: TqdmExperimentalWarning: GUI is experimental/alpha
  for fn in tqdm(list(gen_filenames(paths))):
IndexError while reading progintro_vol1.pdf
Traceback (most recent call last):
  File "./smoketest.py", line 40, in smoke_test_file
    pdf_redactor.redactor(options)
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 101, in redactor
    text_layer = build_text_layer(document, options)
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 451, in build_text_layer
    prev_token[i] = make_mutable_string_token(prev_token[i])
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 423, in make_mutable_string_token
    token = TextToken(token.to_bytes(), current_font)
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 373, in __init__
    self.original_value = toUnicode(value, font, fontcache)
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 647, in toUnicode
    fontcache[font.ToUnicode.stream] = CMap(font.ToUnicode)
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 586, in __init__
    add_mapping(code, cid_or_name1, code-code1)
  File "/home/oleg/pdf-redactor/pdf-redactor-master/pdf_redactor.py", line 547, in add_mapping
    char = char[0:-1] + (chr if sys.version_info >= (3,) else unichr)(ord(char[-1]) + offset)
IndexError: string index out of range