Open littlebuddha16 opened 2 days ago
It's able to parse the PDF, detect tables and recognize the structures but exits with the following error.
{ "name": "ParseError", "message": "not well-formed (invalid token): line 1, column 46 (<string>)", "stack": "Traceback (most recent call last): File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3577 in run_code exec(code_obj, self.user_global_ns, self.user_ns) Cell In[3], line 7 ex.extract(input_file=input_file, output_dir=output_dir, mode=ex.Mode.PRESENTATION) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/extractable/Extractor.py:13 in extract return extract_using_TATR(input_file, output_dir, output_filetype, mode) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/extractable/Extractor.py:34 in extract_using_TATR pipeline(data_object) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/toolz/functoolz.py:489 in __call__ ret = f(ret) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/toolz/functoolz.py:489 in __call__ ret = f(ret) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/extractable/TextExtractor.py:99 in process table_xml = ET.fromstring(table.to_xml_with_coords()) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/extractable/Datatypes/Table.py:37 in to_xml_with_coords row_element = ET.fromstring(row.to_xml_with_coords()) File ~/Documents/GitHub/pdf_two_table/extract_table/lib/python3.10/site-packages/extractable/Datatypes/Row.py:40 in to_xml_with_coords cell_element = ET.fromstring(cell.to_xml_with_coords()) File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/xml/etree/ElementTree.py:1342 in XML parser.feed(text) File <string> ParseError: not well-formed (invalid token): line 1, column 46 " }
It's able to parse the PDF, detect tables and recognize the structures but exits with the following error.