deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

Unable to install textract, getting pocketsphinx error #284

Closed tarunshah closed 5 years ago

tarunshah commented 5 years ago

I had already install swig using conda install swig also downloaded the EbookLib 0.15 zip from the releases unzip it,manually remove (with notepad++) the unicode char in the README.md file. (unicode char is on Line 44)

navigate to unzipped EbookLib 0.15 folder

cd to_unzipped_folder_path_here

pip install it

then when installing textract i'm getting this error

(base) C:\WINDOWS\system32>pip install textract
Collecting textract
Requirement already satisfied: python-pptx==0.6.5 in c:\programdata\anaconda3\lib\site-packages (from textract) (0.6.5)
Requirement already satisfied: xlrd==1.0.0 in c:\programdata\anaconda3\lib\site-packages (from textract) (1.0.0)
Requirement already satisfied: SpeechRecognition==3.6.3 in c:\programdata\anaconda3\lib\site-packages (from textract) (3.6.3)
Requirement already satisfied: argcomplete==1.8.2 in c:\programdata\anaconda3\lib\site-packages (from textract) (1.8.2)
Requirement already satisfied: EbookLib==0.15 in c:\users\username\downloads\ebooklib-0.15 (from textract) (0.15)
Requirement already satisfied: six==1.10.0 in c:\programdata\anaconda3\lib\site-packages (from textract) (1.10.0)
Requirement already satisfied: docx2txt==0.6 in c:\programdata\anaconda3\lib\site-packages (from textract) (0.6)
Requirement already satisfied: chardet==2.3.0 in c:\programdata\anaconda3\lib\site-packages (from textract) (2.3.0)
Collecting pocketsphinx==0.1.3 (from textract)
  Using cached https://files.pythonhosted.org/packages/93/5f/a968e5d53d25e32deb78c3e169fd8612ecf53cc76e32cb40e19be35696af/pocketsphinx-0.1.3.tar.bz2
Requirement already satisfied: beautifulsoup4==4.5.3 in c:\programdata\anaconda3\lib\site-packages (from textract) (4.5.3)
Requirement already satisfied: lxml>=3.1.0 in c:\programdata\anaconda3\lib\site-packages (from python-pptx==0.6.5->textract) (4.3.0)
Requirement already satisfied: XlsxWriter>=0.5.7 in c:\programdata\anaconda3\lib\site-packages (from python-pptx==0.6.5->textract) (1.1.2)
Requirement already satisfied: Pillow>=2.6.1 in c:\programdata\anaconda3\lib\site-packages (from python-pptx==0.6.5->textract) (5.4.1)
Building wheels for collected packages: pocketsphinx
  Building wheel for pocketsphinx (setup.py) ... error
  ERROR: Complete output from command 'c:\programdata\anaconda3\python.exe' -u -c 'import setuptools, tokenize;__file__='"'"'C:\\Users\\username\\AppData\\Local\\Temp\\pip-install-aoq_jexm\\pocketsphinx\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d 'C:\Users\username\AppData\Local\Temp\pip-wheel-iy1l6id3' --python-tag cp36:
  ERROR: running bdist_wheel
  running build_ext
  building 'sphinxbase._ad' extension
  swigging swig/sphinxbase/ad.i to swig/sphinxbase/ad_wrap.c
  c:\programdata\anaconda3\Library\bin\swig.exe -python -modern -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/win32 -Ideps/sphinxbase/swig -outdir sphinxbase -o swig/sphinxbase/ad_wrap.c swig/sphinxbase/ad.i
  creating build
  creating build\temp.win-amd64-3.6
  creating build\temp.win-amd64-3.6\Release
  creating build\temp.win-amd64-3.6\Release\swig
  creating build\temp.win-amd64-3.6\Release\swig\sphinxbase
  creating build\temp.win-amd64-3.6\Release\deps
  creating build\temp.win-amd64-3.6\Release\deps\sphinxbase
  creating build\temp.win-amd64-3.6\Release\deps\sphinxbase\src
  creating build\temp.win-amd64-3.6\Release\deps\sphinxbase\src\libsphinxad
  C:\ProgramData\Anaconda3\Library\mingw-w64\bin\gcc.exe -mdll -O -Wall -DMS_WIN64 -DSPHINXBASE_EXPORTS -DPOCKETSPHINX_EXPORTS -DSPHINX_DLL -DHAVE_CONFIG_H -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/win32 -Ic:\programdata\anaconda3\include -Ic:\programdata\anaconda3\include -c swig/sphinxbase/ad_wrap.c -o build\temp.win-amd64-3.6\Release\swig\sphinxbase\ad_wrap.o /wd4244 /wd4267 /wd4197 /wd4090 /wd4018 /wd4311 /wd4312 /wd4334 /wd4477 /wd4996
  gcc: error: /wd4244: No such file or directory
  gcc: error: /wd4267: No such file or directory
  gcc: error: /wd4197: No such file or directory
  gcc: error: /wd4090: No such file or directory
  gcc: error: /wd4018: No such file or directory
  gcc: error: /wd4311: No such file or directory
  gcc: error: /wd4312: No such file or directory
  gcc: error: /wd4334: No such file or directory
  gcc: error: /wd4477: No such file or directory
  gcc: error: /wd4996: No such file or directory
  error: command 'C:\\ProgramData\\Anaconda3\\Library\\mingw-w64\\bin\\gcc.exe' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for pocketsphinx
  Running setup.py clean for pocketsphinx
Failed to build pocketsphinx
Installing collected packages: pocketsphinx, textract
  Running setup.py install for pocketsphinx ... error
    ERROR: Complete output from command 'c:\programdata\anaconda3\python.exe' -u -c 'import setuptools, tokenize;__file__='"'"'C:\\Users\\username\\AppData\\Local\\Temp\\pip-install-aoq_jexm\\pocketsphinx\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\username\AppData\Local\Temp\pip-record-f0jja4xo\install-record.txt' --single-version-externally-managed --compile:
    ERROR: running install
    running build_ext
    building 'sphinxbase._ad' extension
    swigging swig/sphinxbase/ad.i to swig/sphinxbase/ad_wrap.c
    c:\programdata\anaconda3\Library\bin\swig.exe -python -modern -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/win32 -Ideps/sphinxbase/swig -outdir sphinxbase -o swig/sphinxbase/ad_wrap.c swig/sphinxbase/ad.i
    creating build
    creating build\temp.win-amd64-3.6
    creating build\temp.win-amd64-3.6\Release
    creating build\temp.win-amd64-3.6\Release\swig
    creating build\temp.win-amd64-3.6\Release\swig\sphinxbase
    creating build\temp.win-amd64-3.6\Release\deps
    creating build\temp.win-amd64-3.6\Release\deps\sphinxbase
    creating build\temp.win-amd64-3.6\Release\deps\sphinxbase\src
    creating build\temp.win-amd64-3.6\Release\deps\sphinxbase\src\libsphinxad
    C:\ProgramData\Anaconda3\Library\mingw-w64\bin\gcc.exe -mdll -O -Wall -DMS_WIN64 -DSPHINXBASE_EXPORTS -DPOCKETSPHINX_EXPORTS -DSPHINX_DLL -DHAVE_CONFIG_H -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/win32 -Ic:\programdata\anaconda3\include -Ic:\programdata\anaconda3\include -c swig/sphinxbase/ad_wrap.c -o build\temp.win-amd64-3.6\Release\swig\sphinxbase\ad_wrap.o /wd4244 /wd4267 /wd4197 /wd4090 /wd4018 /wd4311 /wd4312 /wd4334 /wd4477 /wd4996
    gcc: error: /wd4244: No such file or directory
    gcc: error: /wd4267: No such file or directory
    gcc: error: /wd4197: No such file or directory
    gcc: error: /wd4090: No such file or directory
    gcc: error: /wd4018: No such file or directory
    gcc: error: /wd4311: No such file or directory
    gcc: error: /wd4312: No such file or directory
    gcc: error: /wd4334: No such file or directory
    gcc: error: /wd4477: No such file or directory
    gcc: error: /wd4996: No such file or directory
    error: command 'C:\\ProgramData\\Anaconda3\\Library\\mingw-w64\\bin\\gcc.exe' failed with exit status 1
    ----------------------------------------
ERROR: Command "'c:\programdata\anaconda3\python.exe' -u -c 'import setuptools, tokenize;__file__='"'"'C:\\Users\\username\\AppData\\Local\\Temp\\pip-install-aoq_jexm\\pocketsphinx\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\username\AppData\Local\Temp\pip-record-f0jja4xo\install-record.txt' --single-version-externally-managed --compile" failed with error code 1 in C:\Users\username\AppData\Local\Temp\pip-install-aoq_jexm\pocketsphinx\
nick-ds commented 5 years ago

I'm getting the exact same error. Have you figured out how to solve it?

sroertgen commented 5 years ago

Hello there,

try the following workaround (relying heavily on CharushS' solution for the sphinxbase problem):

git clone --recursive https://github.com/bambocher/pocketsphinx-python
cd pocketsphinx-python

Edit file pocketsphinx-python/deps/sphinxbase/src/libsphinxad/ad_openal.c

Change

#include <al.h>
#include <alc.h>

to

#include <OpenAL/al.h>
#include <OpenAL/alc.h>

then run python setup.py install

After that

git clone https://github.com/deanmalmgren/textract.git
cd textract
nano requirements/python

Comment out pocketsphinx==0.1.3 so the file looks like this:

# This file contains all python dependencies that are required by the textract
# package in order for it to properly work.

argcomplete==1.8.2
chardet==3.0.4
python-pptx==0.6.6
#pdfminer.six <-- go back to this after the shebang fix is released (see https://github.com/goulu/pdfminer/issues/27)
https://github.com/goulu/pdfminer/zipball/e6ad15af79a26c31f4e384d8427b375c93b03533#egg=pdfminer.six
docx2txt==0.6
beautifulsoup4==4.6.0
xlrd==1.0.0
EbookLib==0.16
SpeechRecognition==3.7.1
https://github.com/mattgwwalker/msg-extractor/zipball/master
six==1.10.0
#pocketsphinx==0.1.3

after that run: python setup.py install