Problem with parsing PDF table with unicode characters

OlegGavrilov commented 5 years ago

Hello! Sorry for reporting a minor issue, but when I tried to parse table with Unicode characters using Excalibur front-end, I got an error:

ERROR:root:'ascii' codec can't encode character u'\xf6' in position 376: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/excalibur/tasks.py", line 123, in extract
    tables.export(f_datapath, f=f, compress=True)
  File "/usr/local/lib/python2.7/dist-packages/camelot/core.py", line 701, in export
    self._write_file(f=f, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/camelot/core.py", line 659, in _write_file
    to_format(filepath)
  File "/usr/local/lib/python2.7/dist-packages/camelot/core.py", line 594, in to_html
    f.write(html_string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 376: ordinal not in range(128)

Fixed that by adding .encode('utf-8') at core.py:594.

Don't know if this is a good fix, but just hope it can help someone.

Thanks for the amazing project!

ngenovictor commented 5 years ago

Got the same error too and the change also worked out for me.

vinayak-mehta commented 4 years ago

Closing because there's no PDF to reproduce this issue.

atlanhq / camelot

Problem with parsing PDF table with unicode characters #322