INFO:txtfrompdf.__main__:Starting extraction with txt-from-pdf
INFO:txtfrompdf.__main__:Input path: fedex-small50pg.pdf
INFO:txtfrompdf.__main__:Found 1 PDF files
INFO:txtfrompdf.__main__:Extracting: fedex-small50pg.pdf
WARNING:pypdf._reader:Overwriting cache for 0 171
WARNING:pypdf._reader:Overwriting cache for 0 171
WARNING:pypdf.generic._data_structures:PdfReadError("Invalid Elementary Object starting with b's' @556682: b'y\\xf1\\xb8s/\\x1e\\xd6_\\xb8\\xf8\\x1b\\x9b\\xfcB\\xaf\\r\\nendstream\\rendobj\\r17 0 obj\\r<</Contents 18 0 R/CropBox[0.0 0.0 61'")
WARNING:pypdf._reader:Overwriting cache for 0 116
ERROR:txtfrompdf.extract:Generated an exception: 'Length1' for page C:\users\USERNAME\AppData\Local\Temp\tmp6cwjg9ak\18.pdf
Traceback (most recent call last):
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\utils.py", line 21, in temp_directory
yield temp_dir
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\extract.py", line 117, in _extract_txt_from_pdf
raise exc
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\extract.py", line 113, in _extract_txt_from_pdf
texts[page] = future.result()
^^^^^^^^^^^^^^^
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result
raise self._exception
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\extract.py", line 84, in pdf_to_text
interpreter.process_page(page)
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\pdfminer\pdfinterp.py", line 997, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\pdfminer\pdfinterp.py", line 1014, in render_contents
self.init_resources(resources)
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\pdfminer\pdfinterp.py", line 384, in init_resources
self.fontmap[fontid] = self.rsrcmgr.get_font(objid, spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\pdfminer\pdfinterp.py", line 216, in get_font
font = PDFType1Font(self, spec)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\pdfminer\pdffont.py", line 1009, in __init__
length1 = int_value(self.fontfile["Length1"])
~~~~~~~~~~~~~^^^^^^^^^^^
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\pdfminer\pdftypes.py", line 285, in __getitem__
return self.attrs[name]
~~~~~~~~~~^^^^^^
KeyError: 'Length1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Scripts\txt-from-pdf.exe\__main__.py", line 7, in <module>
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\__main__.py", line 68, in cli_main
text = extract_txt_from_pdf(pdf, process_output=not args.no_filter)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\extract.py", line 144, in extract_txt_from_pdf
text = _extract_txt_from_pdf(
^^^^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\extract.py", line 101, in _extract_txt_from_pdf
with temp_directory() as temp_dir:
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\contextlib.py", line 158, in __exit__
self.gen.throw(value)
File "C:\users\USERNAME\OneDrive - CONAME\Documents\PythonProjects\CONAME\Test\Fedexpdf\.venv\Lib\site-packages\txtfrompdf\utils.py", line 23, in temp_directory
shutil.rmtree(temp_dir)
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 808, in rmtree
return _rmtree_unsafe(path, onexc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 636, in _rmtree_unsafe
onexc(os.unlink, fullname, err)
File "C:\users\USERNAME\AppData\Local\Programs\Python\Python312\Lib\shutil.py", line 634, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\chuck\\AppData\\Local\\Temp\\tmp6cwjg9ak\\13.pdf'
I'm on Windows 10 with Python 3.12.
CLI program example gives many errors: