atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 356 forks source link

[Error] CJk support encoding issue #385

Open asherchoi opened 4 years ago

asherchoi commented 4 years ago

PdfReadWarning: Illegal character in Name Object [generic.py:489] Traceback (most recent call last): File "parser.py", line 43, in pprint(HanWhaLifeInsuranceParser.parse(path)) File "parser.py", line 25, in parse tables = camelot.read_pdf(file_name) File "/home/choi/parserenv/lib/python3.6/site-packages/camelot/io.py", line 117, in read_pdf **kwargs File "/home/choi/parserenv/lib/python3.6/site-packages/camelot/handlers.py", line 165, in parse self._save_page(self.filepath, p, tempdir) File "/home/choi/parserenv/lib/python3.6/site-packages/camelot/handlers.py", line 116, in _save_page layout, dim = get_page_layout(fpath) File "/home/choi/parserenv/lib/python3.6/site-packages/camelot/utils.py", line 795, in get_pagelayout document = PDFDocument(parser) File "/home/choi/parserenv/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 566, in init xref.load(parser) File "/home/choi/parserenv/lib/python3.6/site-packages/pdfminer/pdfdocument.py", line 195, in load (, obj) = parser.nextobject() File "/home/choi/parserenv/lib/python3.6/site-packages/pdfminer/psparser.py", line 597, in nextobject raise PSSyntaxError('Invalid dictionary construct: %r' % objs) pdfminer.psparser.PSSyntaxError: Invalid dictionary construct: [/'Type', /'Font', /'Subtype', /'TrueType', /'Name', /'F1', /'BaseFont', /b"b'", /"ABCDEE+\xb9\xd9\xc5\xc1'", /'Encoding', /'WinAnsiEncoding', /'FontDescriptor', , /'FirstChar', 32, /'LastChar', 126, /'Widths', ]

I think camelot have compatability encoding issue with pdfminer. I cannot read my pdf file because of the Error above. I can read the text using another library named 'pdfminer', but pdfminer cannot read table automobily. So I want to use camelot.

How can I process this Error?