deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

Update requirements #239

Closed benman1 closed 5 years ago

benman1 commented 6 years ago

I had problems installing until changing the versions in requirements. Several didn't work as specified originally. I just installed the latest versions in most cases. These versions are what I ended up with.

tvrbanec commented 5 years ago

textract 1.6.1 has requirement chardet==2.3.0, but you'll have chardet 3.0.4 which is incompatible. textract 1.6.1 has requirement six==1.10.0, but you'll have six 1.12.0 which is incompatible.

benman1 commented 5 years ago

Yeah, sorry - I didn't check the tests nor update. This has been a while now. I don't even remember what I did and why I ended up not using textract. I remember though: in the python world, this is one of the better tools.

tvrbanec commented 5 years ago

Please, can you check requirements? Most of them just need new numbers. :-)

benman1 commented 5 years ago

I've just had a look at it. Your error messages don't make any sense to me. You would only see the message if you didn't update some of the files listed above.

I've just cloned and installed my version - as in the listed commits or https://github.com/benman1/textract/ - with no problem whatsoever.

python setup.py install
tvrbanec commented 5 years ago

I have done it but still got newer modules chardet & six which are needed for other modules (like gensim). Textract is working with those newer versions of chardet and six but it is complaining. In your requirements should be newer numbers for them, because it will still working. textract 1.6.1 has requirement chardet==2.3.0, but you'll have chardet 3.0.4 which is incompatible. textract 1.6.1 has requirement six==1.10.0, but you'll have six 1.12.0 which is incompatible.

benman1 commented 5 years ago

That's what I am saying. You don't understand, please read again.

apples-air:textract ben$ pip freeze | grep "chardet"
chardet==3.0.4
apples-air:textract ben$ pip freeze | grep "six"
six==1.12.0
tvrbanec commented 5 years ago

Then that must be the Debain problem?

pip install --upgrade textract

Requirement already up-to-date: textract in /usr/local/lib/python2.7/dist-packages (1.6.1) Requirement already satisfied, skipping upgrade: beautifulsoup4==4.5.3 in /usr/local/lib/python2.7/dist-packages (from textract) (4.5.3) Requirement already satisfied, skipping upgrade: argcomplete==1.8.2 in /usr/local/lib/python2.7/dist-packages (from textract) (1.8.2) Requirement already satisfied, skipping upgrade: docx2txt==0.6 in /usr/local/lib/python2.7/dist-packages (from textract) (0.6) Requirement already satisfied, skipping upgrade: SpeechRecognition==3.6.3 in /usr/local/lib/python2.7/dist-packages (from textract) (3.6.3) Requirement already satisfied, skipping upgrade: EbookLib==0.15 in /usr/local/lib/python2.7/dist-packages (from textract) (0.15) Requirement already satisfied, skipping upgrade: xlrd==1.0.0 in /usr/local/lib/python2.7/dist-packages (from textract) (1.0.0) Requirement already satisfied, skipping upgrade: pocketsphinx==0.1.3 in /usr/local/lib/python2.7/dist-packages (from textract) (0.1.3) Collecting chardet==2.3.0 (from textract) Using cached https://files.pythonhosted.org/packages/7e/5c/605ca2daa5cf21c87690d8fe6ab05a6f2278c451f4ede6456dd26453f4bd/chardet-2.3.0-py2.py3-none-any.whl Requirement already satisfied, skipping upgrade: python-pptx==0.6.5 in /usr/local/lib/python2.7/dist-packages (from textract) (0.6.5) Collecting six==1.10.0 (from textract) Using cached https://files.pythonhosted.org/packages/c8/0a/b6723e1bc4c516cb687841499455a8505b44607ab535be01091c0f24f079/six-1.10.0-py2.py3-none-any.whl Requirement already satisfied, skipping upgrade: lxml in /usr/lib/python2.7/dist-packages (from EbookLib==0.15->textract) (4.2.5) Requirement already satisfied, skipping upgrade: Pillow>=2.6.1 in /usr/local/lib/python2.7/dist-packages (from python-pptx==0.6.5->textract) (5.1.0) Requirement already satisfied, skipping upgrade: XlsxWriter>=0.5.7 in /usr/local/lib/python2.7/dist-packages (from python-pptx==0.6.5->textract) (1.0.5) requests 2.19.0 has requirement chardet<3.1.0,>=3.0.2, but you'll have chardet 2.3.0 which is incompatible. cheroot 6.5.4 has requirement six>=1.11.0, but you'll have six 1.10.0 which is incompatible. cherrypy 17.4.1 has requirement six>=1.11.0, but you'll have six 1.10.0 which is incompatible. Installing collected packages: chardet, six Found existing installation: chardet 3.0.4 Uninstalling chardet-3.0.4: Successfully uninstalled chardet-3.0.4 Found existing installation: six 1.12.0 Uninstalling six-1.12.0: Successfully uninstalled six-1.12.0 Successfully installed chardet-2.3.0 six-1.10.0

tvrbanec commented 5 years ago

Note: Collecting chardet==2.3.0 (from textract) Collecting six==1.10.0 (from textract)

tvrbanec commented 5 years ago

And this as a consequence: requests 2.19.0 has requirement chardet<3.1.0,>=3.0.2, but you'll have chardet 2.3.0 which is incompatible. cheroot 6.5.4 has requirement six>=1.11.0, but you'll have six 1.10.0 which is incompatible. cherrypy 17.4.1 has requirement six>=1.11.0, but you'll have six 1.10.0 which is incompatible.

tvrbanec commented 5 years ago

Running setup also use six==1.10.0:

python setup.py install running install running bdist_egg running egg_info writing requirements to textract.egg-info/requires.txt writing textract.egg-info/PKG-INFO writing top-level names to textract.egg-info/top_level.txt writing dependency_links to textract.egg-info/dependency_links.txt reading manifest file 'textract.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no previously-included files matching '.py[co]' found under directory '' warning: no previously-included files matching '~' found under directory '' warning: no previously-included files matching '.orig' found under directory '' writing manifest file 'textract.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py creating build/bdist.linux-x86_64/egg creating build/bdist.linux-x86_64/egg/textract copying build/lib.linux-x86_64-2.7/textract/exceptions.py -> build/bdist.linux-x86_64/egg/textract creating build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/odt_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/epub_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/csv_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/msg_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/png_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/audio.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/psv_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/mp3_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/wav_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/rtf_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/doc_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/json_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/xls_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/pdf_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/docx_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/jpg_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/ps_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/gif_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/tiff_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/image.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/eml_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/txt_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/xlsx_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/html_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/ogg_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/utils.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/tsv_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/init.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/parsers/pptx_parser.py -> build/bdist.linux-x86_64/egg/textract/parsers copying build/lib.linux-x86_64-2.7/textract/cli.py -> build/bdist.linux-x86_64/egg/textract copying build/lib.linux-x86_64-2.7/textract/init.py -> build/bdist.linux-x86_64/egg/textract copying build/lib.linux-x86_64-2.7/textract/colors.py -> build/bdist.linux-x86_64/egg/textract byte-compiling build/bdist.linux-x86_64/egg/textract/exceptions.py to exceptions.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/odt_parser.py to odt_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/epub_parser.py to epub_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/csv_parser.py to csv_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/msg_parser.py to msg_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/png_parser.py to png_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/audio.py to audio.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/psv_parser.py to psv_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/mp3_parser.py to mp3_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/wav_parser.py to wav_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/rtf_parser.py to rtf_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/doc_parser.py to doc_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/json_parser.py to json_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/xls_parser.py to xls_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/pdf_parser.py to pdf_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/docx_parser.py to docx_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/jpg_parser.py to jpg_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/ps_parser.py to ps_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/gif_parser.py to gif_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/tiff_parser.py to tiff_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/image.py to image.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/eml_parser.py to eml_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/txt_parser.py to txt_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/xlsx_parser.py to xlsx_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/html_parser.py to html_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/ogg_parser.py to ogg_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/utils.py to utils.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/tsv_parser.py to tsv_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/init.py to init.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/parsers/pptx_parser.py to pptx_parser.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/cli.py to cli.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/init.py to init.pyc byte-compiling build/bdist.linux-x86_64/egg/textract/colors.py to colors.pyc creating build/bdist.linux-x86_64/egg/EGG-INFO installing scripts to build/bdist.linux-x86_64/egg/EGG-INFO/scripts running install_scripts running build_scripts creating build/bdist.linux-x86_64/egg/EGG-INFO/scripts copying build/scripts-2.7/textract -> build/bdist.linux-x86_64/egg/EGG-INFO/scripts changing mode of build/bdist.linux-x86_64/egg/EGG-INFO/scripts/textract to 755 copying textract.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO copying textract.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying textract.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying textract.egg-info/not-zip-safe -> build/bdist.linux-x86_64/egg/EGG-INFO copying textract.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO copying textract.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO creating 'dist/textract-1.6.1-py2.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it removing 'build/bdist.linux-x86_64/egg' (and everything under it) Processing textract-1.6.1-py2.7.egg removing '/usr/local/lib/python2.7/dist-packages/textract-1.6.1-py2.7.egg' (and everything under it) creating /usr/local/lib/python2.7/dist-packages/textract-1.6.1-py2.7.egg Extracting textract-1.6.1-py2.7.egg to /usr/local/lib/python2.7/dist-packages textract 1.6.1 is already the active version in easy-install.pth Installing textract script to /usr/local/bin

Installed /usr/local/lib/python2.7/dist-packages/textract-1.6.1-py2.7.egg Processing dependencies for textract==1.6.1 Searching for xlrd==1.0.0 Best match: xlrd 1.0.0 Adding xlrd 1.0.0 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Searching for six==1.10.0 Best match: six 1.10.0 Adding six 1.10.0 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Searching for python-pptx==0.6.6 Best match: python-pptx 0.6.6 Processing python_pptx-0.6.6-py2.7.egg python-pptx 0.6.6 is already the active version in easy-install.pth

Using /usr/local/lib/python2.7/dist-packages/python_pptx-0.6.6-py2.7.egg Searching for pocketsphinx==0.1.3 Best match: pocketsphinx 0.1.3 Adding pocketsphinx 0.1.3 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Searching for docx2txt==0.6 Best match: docx2txt 0.6 Adding docx2txt 0.6 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Searching for chardet==3.0.4 Best match: chardet 3.0.4 Adding chardet 3.0.4 to easy-install.pth file Installing chardetect script to /usr/local/bin

Using /usr/lib/python2.7/dist-packages Searching for beautifulsoup4==4.6.0 Best match: beautifulsoup4 4.6.0 Processing beautifulsoup4-4.6.0-py2.7.egg beautifulsoup4 4.6.0 is already the active version in easy-install.pth

Using /usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.6.0-py2.7.egg Searching for argcomplete==1.8.2 Best match: argcomplete 1.8.2 Adding argcomplete 1.8.2 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Searching for SpeechRecognition==3.7.1 Best match: SpeechRecognition 3.7.1 Processing SpeechRecognition-3.7.1-py2.7.egg SpeechRecognition 3.7.1 is already the active version in easy-install.pth

Using /usr/local/lib/python2.7/dist-packages/SpeechRecognition-3.7.1-py2.7.egg Searching for EbookLib==0.16 Best match: EbookLib 0.16 Processing EbookLib-0.16-py2.7.egg EbookLib 0.16 is already the active version in easy-install.pth

Using /usr/local/lib/python2.7/dist-packages/EbookLib-0.16-py2.7.egg Searching for lxml==4.2.5 Best match: lxml 4.2.5 Adding lxml 4.2.5 to easy-install.pth file

Using /usr/lib/python2.7/dist-packages Searching for XlsxWriter==1.0.5 Best match: XlsxWriter 1.0.5 Adding XlsxWriter 1.0.5 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Searching for Pillow==5.1.0 Best match: Pillow 5.1.0 Adding Pillow 5.1.0 to easy-install.pth file

Using /usr/local/lib/python2.7/dist-packages Finished processing dependencies for textract==1.6.1

tvrbanec commented 5 years ago

After that I ndeed tu upgrade chardet: pip install --upgrade chardet Collecting chardet Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl textract 1.6.1 has requirement chardet==2.3.0, but you'll have chardet 3.0.4 which is incompatible. Installing collected packages: chardet Found existing installation: chardet 2.3.0 Uninstalling chardet-2.3.0: Successfully uninstalled chardet-2.3.0 Successfully installed chardet-3.0.4

DanielSwain commented 5 years ago

I've posted this comment on a derivation of textract (Wagtail Tetxtract), but it has to do with installing textract directly from @deanmalmgren's repo so might be helpful.

jpweytjens commented 5 years ago

Closed by #292.