Open tehabstract opened 2 years ago
@deanmalmgren Any chance this could get looked into? Python 2 was left with no support on Jan 1 2020, and the older packages required for textract to work with 2.7 do cause conflicts. In particular, our team would appreciate bumping pdfminer.six to a newer version.
pdfminer.six >= 20200726 is required for using unstructured, which is required by langchain!
Quick note that I've tested this patch lightly, the only problem I've found so far relates to an update to Python's subprocess
module:
diff --git a/textract/parsers/utils.py b/textract/parsers/utils.py
index 11ec8a1..efb0d9c 100755
--- a/textract/parsers/utils.py
+++ b/textract/parsers/utils.py
@@ -83,7 +83,7 @@ class ShellParser(BaseParser):
"""
# run a subprocess and put the stdout and stderr on the pipe object
- if subprocess.mswindows:
+ if subprocess._mswindows:
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
else:
Otherwise it's been working well for me.
Dropping python2 support, loosening up dependencies. Please comment if you want dependencies in a different format, or any changes and I will adjust.
Introduced openpyxl for xlsx files. Updated 2 test files:
Updated travis, vagrant, dockerfile in tests.
Upped the version to 1.7.0, added to changelog.
Thanks