atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 355 forks source link

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 8-9: ordinal not in range(256) #231

Closed ravedata closed 5 years ago

ravedata commented 5 years ago

While reading the pdf in japanese language. I am getting the following error.

Note - it worked for earlier pdfs in japanese language.

Traceback (most recent call last):
  File "", line 14, in 
    tables = camelot.read_pdf(filename,pages='1-end')
  File "C:\Users\hp\Anaconda3\lib\site-packages\camelot\io.py", line 101, in read_pdf
    tables = p.parse(flavor=flavor, **kwargs)
  File "C:\Users\hp\Anaconda3\lib\site-packages\camelot\handlers.py", line 149, in parse
    self._save_page(self.filename, p, tempdir)
  File "C:\Users\hp\Anaconda3\lib\site-packages\camelot\handlers.py", line 105, in _save_page
    outfile.write(f)
  File "C:\Users\hp\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 501, in write
    obj.writeToStream(stream, key)
  File "C:\Users\hp\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 549, in writeToStream
    value.writeToStream(stream, encryption_key)
  File "C:\Users\hp\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 472, in writeToStream
    stream.write(b_(self))
  File "C:\Users\hp\Anaconda3\lib\site-packages\PyPDF2\utils.py", line 238, in b_
    r = s.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 8-9: ordinal not in range(256)
ravedata commented 5 years ago

140120181126441404.pdf

I have uploaded the pdf I am working with.

vinayak-mehta commented 5 years ago

@vedantnarayan Looks like there's some problem with the PDF. I fixed it using qpdf and was able to extract tables.

$ qpdf --decrypt 140120181126441404.pdf fixed.pdf
xbchen711 commented 5 years ago

I have the same error,have you worked it out?

XG1997 commented 4 years ago

I have the same error,have you worked it out?

I got the same problem:(

logoyoung commented 2 years ago

俺也一样

pannchat commented 2 years ago

@vedantnarayan Looks like there's some problem with the PDF. I fixed it using qpdf and was able to extract tables.

$ qpdf --decrypt 140120181126441404.pdf fixed.pdf

thank you so much