explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.76k stars 4.36k forks source link

Cannot install on Debian 7 i386 #92

Closed wohali closed 8 years ago

wohali commented 9 years ago

Debian 7 is python 2.7.3 for reference.

I've tried both system python and a virtualenv, using pip and installing from source. After failing to install in system python, I rm -rf'ed spacy from system python before attempting a virtualenv pip install.

While pip install itself succeeds on both system python and in a virtualenv, but after downloading the model (3x tries to download with consistent md5sums, even) I get data integrity failures:

(virt1)[514 atypical:joant virt1] $ python
Python 2.7.3 (default, Mar 14 2014, 11:57:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from __future__ import unicode_literals, print_function
>>> from spacy.en import English
>>> nlp = English()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/joant/mcguffey/virt1/local/lib/python2.7/site-packages/spacy/en/__init__.py", line 92, in __init__
    oov_prob=oov_prob)
  File "spacy/vocab.pyx", line 59, in spacy.vocab.Vocab.__init__ (spacy/vocab.cpp:2836)
  File "spacy/vocab.pyx", line 224, in spacy.vocab.Vocab.load_lexemes (spacy/vocab.cpp:5453)
IOError: Error reading from lexemes.bin. Integrity check fails.
>>> exit()
(virt1)[515 atypical:joant virt1] $ md5sum ./lib/python2.7/site-packages/spacy/en/data/vocab/lexemes.bin
0545807daed0f40d95502a21223dc35f  ./lib/python2.7/site-packages/spacy/en/data/vocab/lexemes.bin

What am I doing wrong? At a guess, is there a mismatch between model and data file versions?

For reference, Installing from source fails during compilation with no .c files found, see https://gist.github.com/wohali/028bc16e820d4e0932c4 for details of the failing step (following your steps to the letter from http://spacy.io/#install)

wohali commented 9 years ago

FYI I have managed to get it to install and run correctly on another Debian 7 install, one that is 64-bit. The VM with the problem is 32-bit. Does spaCy require a 64-bit system?

honnibal commented 9 years ago

If it does, that's a bug. Thanks for the report, I'll look into this.

honnibal commented 9 years ago

Okay, I'm pretty sure I see the problem. Will have this fixed shortly.

honnibal commented 9 years ago

I'm still trying to work out what's wrong with the compilation, though. Those steps are working for me on Ubuntu, and I see from your log that everything looks right.

I'll make a Debian instance on AWS and work through this.

honnibal commented 9 years ago

I'm still having trouble replicating the compilation problem. You've definitely done everything correctly.

The first line of the log that's in error is on line 210 of your log:

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I. -I/home/joant/mcguffey/spaCy/.env/include -I/usr/include/python2.7 -c murmurhash/mrmr.c -o build/temp.linux-x86_64-2.7/murmurhash/mrmr.o
gcc: error: murmurhash/mrmr.c: No such file or directory

setuptools has executed the build command for murmurhash/mrmr.c, when it should be murmurhash/mrmr.cpp .

This reminds me of an old bug in setuptools, where Cython extensions would be forced to compile as .c files:

http://bugs.python.org/setuptools/issue138 https://bitbucket.org/pypa/setuptools/issues/288/cannot-specify-cython-under-setup_requires

However, the build works for me on the default version of setuptools on Ubuntu (2.2), and the latest (18.1). It looks like Debian Jessie ships with setuptools 5.5.1, and I tried that and it worked as well. Can you please try the following (in a fresh directory)

virtualenv .env && source .env/bin/activate
pip list

And paste the output?

Next steps:

1) Try installing cython and murmurhash in sequence, rather than relying on the dependency resolution

pip install cython
pip install murmurhash

Does that succeed? If so, try doing that before you run "pip install -r requirements.txt". The setup_requires mechanism in setuptools can be unreliable.

2) Try updating setuptools

pip install --upgrade setuptools
pip install cython
pip install murmurhash

Again, thanks for your patience on this. I really want the install process to go smoothly, and I appreciate the time you've put in to trying the library and reporting problems.

wohali commented 9 years ago

Output of pip list on the system Python:

chardet (2.3.0)
defusedxml (0.4.1)
docutils (0.12)
Pillow (2.6.1)
pip (7.1.2)
Pygments (2.0.1)
python-apt (0.9.3.12)
python-debian (0.1.27)
python-debianbts (1.11)
reportbug (6.6.3)
roman (2.0.0)
setuptools (18.3.1)
six (1.8.0)
SOAPpy (0.12.22)
virtualenv (13.1.2)
wheel (0.24.0)
wstools (0.4.3)

Output of pip list from a brand new virtualenv:

pip (7.1.2)
setuptools (18.2)
wheel (0.24.0)

Doing pip install cython followed by pip install murmurhash is fine:

(.env) $ pip install cython
Collecting cython
Installing collected packages: cython
Successfully installed cython-0.23.2
(.env) $ pip install murmurhash
Collecting murmurhash
Installing collected packages: murmurhash
Successfully installed murmurhash-0.24

I then proceeded to pip install -r requirements.txt which installed:

Successfully installed cymem-1.11 numpy-1.9.2 pathlib-1.0.1 plac-0.9.1 preshed-0.41 six-1.9.0 thinc-3.3 ujson-1.33 unidecode-0.4.18 wget-2.2

but the build still fails in the same way:

(.env) $ pip install -r requirements.txt
Requirement already satisfied (use --upgrade to upgrade): cython in /home/joant/.env/lib/python2.7/site-packages (from -r requirements.txt (line 1))
Collecting cymem==1.11 (from -r requirements.txt (line 2))
Collecting pathlib (from -r requirements.txt (line 3))
Collecting preshed==0.41 (from -r requirements.txt (line 4))
Collecting thinc==3.3 (from -r requirements.txt (line 5))
Requirement already satisfied (use --upgrade to upgrade): murmurhash==0.24 in /home/joant/.env/lib/python2.7/site-packages (from -r requirements.txt (line 6))
Collecting unidecode (from -r requirements.txt (line 7))
Collecting numpy (from -r requirements.txt (line 8))
Collecting wget (from -r requirements.txt (line 9))
Collecting plac (from -r requirements.txt (line 10))
Collecting six (from -r requirements.txt (line 11))
  Using cached six-1.9.0-py2.py3-none-any.whl
Collecting ujson (from -r requirements.txt (line 12))
Installing collected packages: cymem, pathlib, preshed, thinc, unidecode, numpy, wget, plac, six, ujson
Successfully installed cymem-1.11 numpy-1.9.2 pathlib-1.0.1 plac-0.9.1 preshed-0.41 six-1.9.0 thinc-3.3 ujson-1.33 unidecode-0.4.18 wget-2.2
(.env)[669 atypical:joant spaCy] $ python setup.py build_ext --inplace
running build_ext
cythoning spacy/parts_of_speech.pyx to spacy/parts_of_speech.cpp
building 'spacy.parts_of_speech' extension
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/spacy
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I. -I/home/joant/.env/include -I/usr/include/python2.7 -c spacy/parts_of_speech.cpp -o build/temp.linux-x86_64-2.7/spacy/parts_of_speech.o -O3 -Wno-strict-prototypes
cc1plus: warning: command line option ‘-Wno-strict-prototypes’ is valid for C/ObjC but not for C++
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/spacy/parts_of_speech.o -o /home/joant/spaCy/spacy/parts_of_speech.so
cythoning spacy/strings.pyx to spacy/strings.cpp
building 'spacy.strings' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I. -I/home/joant/.env/include -I/usr/include/python2.7 -c spacy/strings.cpp -o build/temp.linux-x86_64-2.7/spacy/strings.o -O3 -Wno-strict-prototypes
cc1plus: warning: command line option ‘-Wno-strict-prototypes’ is valid for C/ObjC but not for C++
spacy/strings.cpp:249:36: fatal error: murmurhash/MurmurHash3.h: No such file or directory
 #include "murmurhash/MurmurHash3.h"
                                    ^
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

I tried your suggestion 2 (upgrade setuptools, manually install cython/murmurhash, then proceed to install -r requirements.txt and python setup.py build_ext --inplace.) It fails at exactly the same place.

If you want I can probably arrange for a temporary shell login on the test box for you to try and diagnose the problem yourself. That said, installing/launching a brand new Debian Jessie 64-bit instance and trying to follow your own install steps should work...

honnibal commented 9 years ago

Do you have the package:

python-dev

Installed? I think this might be the problem. When this package isn't installed, virtualenv silently behaves differently. It doesn't create an include/ directory within the env directory, which is where my install scripts are trying to install the Murmurhash headers.

This was a frustrating one to debug, because I mindlessly install the same set of build dependencies when I launch a new instance. It didn't occur to me that this could be the issue.

wohali commented 9 years ago

Yes, I had python-dev installed already. Was hoping this was it, but it's not.

Here's the contents of a freshly created virtualenv .env:

.env:
bin/  include/  lib/  local/

.env/bin:
activate       activate_this.py   pip*     python*     wheel*
activate.csh   easy_install*      pip2*    python2@
activate.fish  easy_install-2.7*  pip2.7*  python2.7@

.env/include:
python2.7@

.env/lib:
python2.7/

.env/lib/python2.7:
_abcoll.py@   genericpath.py@              posixpath.py@      sre_parse.pyc
_abcoll.pyc   genericpath.pyc              posixpath.pyc      sre.py@
abc.py@       lib-dynload@                 re.py@             stat.py@
abc.pyc       linecache.py@                re.pyc             stat.pyc
codecs.py@    linecache.pyc                site-packages/     types.py@
codecs.pyc    locale.py@                   site.py            types.pyc
copy_reg.py@  locale.pyc                   site.pyc           UserDict.py@
copy_reg.pyc  no-global-site-packages.txt  sre_compile.py@    UserDict.pyc
distutils/    ntpath.py@                   sre_compile.pyc    warnings.py@
encodings@    orig-prefix.txt              sre_constants.py@  warnings.pyc
fnmatch.py@   os.py@                       sre_constants.pyc  _weakrefset.py@
fnmatch.pyc   os.pyc                       sre_parse.py@      _weakrefset.pyc

.env/lib/python2.7/distutils:
distutils.cfg  __init__.py  __init__.pyc

.env/lib/python2.7/site-packages:
easy_install.py   pip-7.1.2.dist-info/        wheel/
easy_install.pyc  pkg_resources/              wheel-0.24.0.dist-info/
_markerlib/       setuptools/
pip/              setuptools-18.2.dist-info/

.env/lib/python2.7/site-packages/_markerlib:
__init__.py  __init__.pyc  markers.py  markers.pyc

.env/lib/python2.7/site-packages/pip:
basecommand.py   compat/         __init__.py    operations/       vcs/
basecommand.pyc  download.py     __init__.pyc   pep425tags.py     _vendor/
baseparser.py    download.pyc    locations.py   pep425tags.pyc    wheel.py
baseparser.pyc   exceptions.py   locations.pyc  req/              wheel.pyc
cmdoptions.py    exceptions.pyc  __main__.py    status_codes.py
cmdoptions.pyc   index.py        __main__.pyc   status_codes.pyc
commands/        index.pyc       models/        utils/

.env/lib/python2.7/site-packages/pip/commands:
completion.py   help.py       install.py   search.py   uninstall.py
completion.pyc  help.pyc      install.pyc  search.pyc  uninstall.pyc
freeze.py       __init__.py   list.py      show.py     wheel.py
freeze.pyc      __init__.pyc  list.pyc     show.pyc    wheel.pyc

.env/lib/python2.7/site-packages/pip/compat:
dictconfig.py  dictconfig.pyc  __init__.py  __init__.pyc

.env/lib/python2.7/site-packages/pip/models:
index.py  index.pyc  __init__.py  __init__.pyc

.env/lib/python2.7/site-packages/pip/operations:
freeze.py  freeze.pyc  __init__.py  __init__.pyc

.env/lib/python2.7/site-packages/pip/req:
__init__.py   req_file.py   req_install.py   req_set.py   req_uninstall.py
__init__.pyc  req_file.pyc  req_install.pyc  req_set.pyc  req_uninstall.pyc

.env/lib/python2.7/site-packages/pip/utils:
appdirs.py   deprecation.py   __init__.py   outdated.py
appdirs.pyc  deprecation.pyc  __init__.pyc  outdated.pyc
build.py     filesystem.py    logging.py    ui.py
build.pyc    filesystem.pyc   logging.pyc   ui.pyc

.env/lib/python2.7/site-packages/pip/vcs:
bazaar.py   git.py   __init__.py   mercurial.py   subversion.py
bazaar.pyc  git.pyc  __init__.pyc  mercurial.pyc  subversion.pyc

.env/lib/python2.7/site-packages/pip/_vendor:
cachecontrol/  __init__.py    lockfile/       progress/     re-vendor.py
colorama/      __init__.pyc   _markerlib/     requests/     re-vendor.pyc
distlib/       ipaddress.py   packaging/      retrying.py   six.py
html5lib/      ipaddress.pyc  pkg_resources/  retrying.pyc  six.pyc

.env/lib/python2.7/site-packages/pip/_vendor/cachecontrol:
adapter.py   caches/        controller.pyc   heuristics.pyc  serialize.pyc
adapter.pyc  compat.py      filewrapper.py   __init__.py     wrapper.py
cache.py     compat.pyc     filewrapper.pyc  __init__.pyc    wrapper.pyc
cache.pyc    controller.py  heuristics.py    serialize.py

.env/lib/python2.7/site-packages/pip/_vendor/cachecontrol/caches:
file_cache.py   __init__.py   redis_cache.py
file_cache.pyc  __init__.pyc  redis_cache.pyc

.env/lib/python2.7/site-packages/pip/_vendor/colorama:
ansi.py   ansitowin32.py   initialise.py   __init__.py   win32.py   winterm.py
ansi.pyc  ansitowin32.pyc  initialise.pyc  __init__.pyc  win32.pyc  winterm.pyc

.env/lib/python2.7/site-packages/pip/_vendor/distlib:
_backport/    index.pyc     manifest.pyc  resources.pyc  util.pyc     wheel.pyc
compat.py     __init__.py   markers.py    scripts.py     version.py
compat.pyc    __init__.pyc  markers.pyc   scripts.pyc    version.pyc
database.py   locators.py   metadata.py   t32.exe        w32.exe
database.pyc  locators.pyc  metadata.pyc  t64.exe        w64.exe
index.py      manifest.py   resources.py  util.py        wheel.py

.env/lib/python2.7/site-packages/pip/_vendor/distlib/_backport:
__init__.py   misc.py   shutil.py   sysconfig.cfg  sysconfig.pyc  tarfile.pyc
__init__.pyc  misc.pyc  shutil.pyc  sysconfig.py   tarfile.py

.env/lib/python2.7/site-packages/pip/_vendor/html5lib:
constants.py     ihatexml.py     inputstream.pyc  tokenizer.pyc  utils.py
constants.pyc    ihatexml.pyc    sanitizer.py     treeadapters/  utils.pyc
filters/         __init__.py     sanitizer.pyc    treebuilders/
html5parser.py   __init__.pyc    serializer/      treewalkers/
html5parser.pyc  inputstream.py  tokenizer.py     trie/

.env/lib/python2.7/site-packages/pip/_vendor/html5lib/filters:
alphabeticalattributes.py   inject_meta_charset.py   sanitizer.py
alphabeticalattributes.pyc  inject_meta_charset.pyc  sanitizer.pyc
_base.py                    lint.py                  whitespace.py
_base.pyc                   lint.pyc                 whitespace.pyc
__init__.py                 optionaltags.py
__init__.pyc                optionaltags.pyc

.env/lib/python2.7/site-packages/pip/_vendor/html5lib/serializer:
htmlserializer.py  htmlserializer.pyc  __init__.py  __init__.pyc

.env/lib/python2.7/site-packages/pip/_vendor/html5lib/treeadapters:
__init__.py  __init__.pyc  sax.py  sax.pyc

.env/lib/python2.7/site-packages/pip/_vendor/html5lib/treebuilders:
_base.py   dom.py   etree_lxml.py   etree.py   __init__.py
_base.pyc  dom.pyc  etree_lxml.pyc  etree.pyc  __init__.pyc

.env/lib/python2.7/site-packages/pip/_vendor/html5lib/treewalkers:
_base.py   dom.pyc    genshistream.py   __init__.pyc   pulldom.py
_base.pyc  etree.py   genshistream.pyc  lxmletree.py   pulldom.pyc
dom.py     etree.pyc  __init__.py       lxmletree.pyc

.env/lib/python2.7/site-packages/pip/_vendor/html5lib/trie:
_base.py   datrie.py   __init__.py   py.py
_base.pyc  datrie.pyc  __init__.pyc  py.pyc

.env/lib/python2.7/site-packages/pip/_vendor/lockfile:
__init__.py      linklockfile.pyc   pidlockfile.py     sqlitelockfile.pyc
__init__.pyc     mkdirlockfile.py   pidlockfile.pyc    symlinklockfile.py
linklockfile.py  mkdirlockfile.pyc  sqlitelockfile.py  symlinklockfile.pyc

.env/lib/python2.7/site-packages/pip/_vendor/_markerlib:
__init__.py  __init__.pyc  markers.py  markers.pyc

.env/lib/python2.7/site-packages/pip/_vendor/packaging:
__about__.py   _compat.pyc   specifiers.py   _structures.pyc
__about__.pyc  __init__.py   specifiers.pyc  version.py
_compat.py     __init__.pyc  _structures.py  version.pyc

.env/lib/python2.7/site-packages/pip/_vendor/pkg_resources:
__init__.py  __init__.pyc

.env/lib/python2.7/site-packages/pip/_vendor/progress:
bar.py   counter.py   helpers.py   __init__.py   spinner.py
bar.pyc  counter.pyc  helpers.pyc  __init__.pyc  spinner.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests:
adapters.py   cacert.pem  cookies.pyc     __init__.pyc  status_codes.py
adapters.pyc  certs.py    exceptions.py   models.py     status_codes.pyc
api.py        certs.pyc   exceptions.pyc  models.pyc    structures.py
api.pyc       compat.py   hooks.py        packages/     structures.pyc
auth.py       compat.pyc  hooks.pyc       sessions.py   utils.py
auth.pyc      cookies.py  __init__.py     sessions.pyc  utils.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages:
chardet/  __init__.py  __init__.pyc  urllib3/

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages/chardet:
big5freq.py             euckrfreq.py            langhebrewmodel.py
big5freq.pyc            euckrfreq.pyc           langhebrewmodel.pyc
big5prober.py           euckrprober.py          langhungarianmodel.py
big5prober.pyc          euckrprober.pyc         langhungarianmodel.pyc
chardetect.py           euctwfreq.py            langthaimodel.py
chardetect.pyc          euctwfreq.pyc           langthaimodel.pyc
chardistribution.py     euctwprober.py          latin1prober.py
chardistribution.pyc    euctwprober.pyc         latin1prober.pyc
charsetgroupprober.py   gb2312freq.py           mbcharsetprober.py
charsetgroupprober.pyc  gb2312freq.pyc          mbcharsetprober.pyc
charsetprober.py        gb2312prober.py         mbcsgroupprober.py
charsetprober.pyc       gb2312prober.pyc        mbcsgroupprober.pyc
codingstatemachine.py   hebrewprober.py         mbcssm.py
codingstatemachine.pyc  hebrewprober.pyc        mbcssm.pyc
compat.py               __init__.py             sbcharsetprober.py
compat.pyc              __init__.pyc            sbcharsetprober.pyc
constants.py            jisfreq.py              sbcsgroupprober.py
constants.pyc           jisfreq.pyc             sbcsgroupprober.pyc
cp949prober.py          jpcntx.py               sjisprober.py
cp949prober.pyc         jpcntx.pyc              sjisprober.pyc
escprober.py            langbulgarianmodel.py   universaldetector.py
escprober.pyc           langbulgarianmodel.pyc  universaldetector.pyc
escsm.py                langcyrillicmodel.py    utf8prober.py
escsm.pyc               langcyrillicmodel.pyc   utf8prober.pyc
eucjpprober.py          langgreekmodel.py
eucjpprober.pyc         langgreekmodel.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3:
_collections.py     connection.pyc  fields.pyc    packages/        response.py
_collections.pyc    contrib/        filepost.py   poolmanager.py   response.pyc
connectionpool.py   exceptions.py   filepost.pyc  poolmanager.pyc  util/
connectionpool.pyc  exceptions.pyc  __init__.py   request.py
connection.py       fields.py       __init__.pyc  request.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/contrib:
__init__.py   ntlmpool.py   pyopenssl.py
__init__.pyc  ntlmpool.pyc  pyopenssl.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/packages:
__init__.py   ordered_dict.py   six.py   ssl_match_hostname/
__init__.pyc  ordered_dict.pyc  six.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/packages/ssl_match_hostname:
_implementation.py  _implementation.pyc  __init__.py  __init__.pyc

.env/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util:
connection.py   __init__.pyc  response.py   retry.pyc  timeout.py   url.pyc
connection.pyc  request.py    response.pyc  ssl_.py    timeout.pyc
__init__.py     request.pyc   retry.py      ssl_.pyc   url.py

.env/lib/python2.7/site-packages/pip-7.1.2.dist-info:
DESCRIPTION.rst   METADATA       pbr.json  top_level.txt
entry_points.txt  metadata.json  RECORD    WHEEL

.env/lib/python2.7/site-packages/pkg_resources:
__init__.py  __init__.pyc  _vendor/

.env/lib/python2.7/site-packages/pkg_resources/_vendor:
__init__.py  __init__.pyc  packaging/

.env/lib/python2.7/site-packages/pkg_resources/_vendor/packaging:
__about__.py   _compat.pyc   specifiers.py   _structures.pyc
__about__.pyc  __init__.py   specifiers.pyc  version.py
_compat.py     __init__.pyc  _structures.py  version.pyc

.env/lib/python2.7/site-packages/setuptools:
archive_util.py   extension.py       package_index.pyc  ssl_support.py
archive_util.pyc  extension.pyc      py26compat.py      ssl_support.pyc
cli-32.exe        gui-32.exe         py26compat.pyc     unicode_utils.py
cli-64.exe        gui-64.exe         py27compat.py      unicode_utils.pyc
cli-arm-32.exe    gui-arm-32.exe     py27compat.pyc     utils.py
cli.exe           gui.exe            py31compat.py      utils.pyc
command/          __init__.py        py31compat.pyc     version.py
compat.py         __init__.pyc       sandbox.py         version.pyc
compat.pyc        lib2to3_ex.py      sandbox.pyc        windows_support.py
depends.py        lib2to3_ex.pyc     script (dev).tmpl  windows_support.pyc
depends.pyc       msvc9_support.py   script.tmpl
dist.py           msvc9_support.pyc  site-patch.py
dist.pyc          package_index.py   site-patch.pyc

.env/lib/python2.7/site-packages/setuptools/command:
alias.py           build_py.pyc          install_lib.py         saveopts.py
alias.pyc          develop.py            install_lib.pyc        saveopts.pyc
bdist_egg.py       develop.pyc           install.py             sdist.py
bdist_egg.pyc      easy_install.py       install.pyc            sdist.pyc
bdist_rpm.py       easy_install.pyc      install_scripts.py     setopt.py
bdist_rpm.pyc      egg_info.py           install_scripts.pyc    setopt.pyc
bdist_wininst.py   egg_info.pyc          launcher manifest.xml  test.py
bdist_wininst.pyc  __init__.py           register.py            test.pyc
build_ext.py       __init__.pyc          register.pyc           upload_docs.py
build_ext.pyc      install_egg_info.py   rotate.py              upload_docs.pyc
build_py.py        install_egg_info.pyc  rotate.pyc

.env/lib/python2.7/site-packages/setuptools-18.2.dist-info:
dependency_links.txt  entry_points.txt  metadata.json  top_level.txt  zip-safe
DESCRIPTION.rst       METADATA          RECORD         WHEEL

.env/lib/python2.7/site-packages/wheel:
archive.py       egg2wheel.py   install.pyc   paths.pyc       test/
archive.pyc      egg2wheel.pyc  __main__.py   pep425tags.py   tool/
bdist_wheel.py   eggnames.txt   __main__.pyc  pep425tags.pyc  util.py
bdist_wheel.pyc  __init__.py    metadata.py   pkginfo.py      util.pyc
decorator.py     __init__.pyc   metadata.pyc  pkginfo.pyc     wininst2wheel.py
decorator.pyc    install.py     paths.py      signatures/     wininst2wheel.pyc

.env/lib/python2.7/site-packages/wheel/signatures:
djbec.py   ed25519py.py   __init__.py   keys.py
djbec.pyc  ed25519py.pyc  __init__.pyc  keys.pyc

.env/lib/python2.7/site-packages/wheel/test:
complex-dist/                    test_install.py     test_signatures.pyc
headers.dist/                    test_install.pyc    test_tagopt.py
__init__.py                      test_keys.py        test_tagopt.pyc
__init__.pyc                     test_keys.pyc       test_tool.py
pydist-schema.json               test_paths.py       test_tool.pyc
simple.dist/                     test_paths.pyc      test_wheelfile.py
test-1.0-py2.py3-none-win32.whl  test_ranking.py     test_wheelfile.pyc
test_basic.py                    test_ranking.pyc
test_basic.pyc                   test_signatures.py

.env/lib/python2.7/site-packages/wheel/test/complex-dist:
complexdist/  setup.py  setup.pyc

.env/lib/python2.7/site-packages/wheel/test/complex-dist/complexdist:
__init__.py  __init__.pyc

.env/lib/python2.7/site-packages/wheel/test/headers.dist:
header.h  headersdist.py  headersdist.pyc  setup.py  setup.pyc

.env/lib/python2.7/site-packages/wheel/test/simple.dist:
setup.py  setup.pyc  simpledist/

.env/lib/python2.7/site-packages/wheel/test/simple.dist/simpledist:
__init__.py  __init__.pyc

.env/lib/python2.7/site-packages/wheel/tool:
__init__.py  __init__.pyc

.env/lib/python2.7/site-packages/wheel-0.24.0.dist-info:
DESCRIPTION.rst   LICENSE.txt  metadata.json  RECORD.jws     WHEEL
entry_points.txt  METADATA     RECORD         top_level.txt

.env/local:
bin@  include@  lib@
honnibal commented 9 years ago

Super puzzling. I'll reboot the instance and try again.

honnibal commented 9 years ago

Okay so the following just worked for me on a fresh EC2 instance:

Instance type: t2-medium AMI: debian-jessie-amd64-hvm-2015-06-07-12-27-ebs (ami-116d857a)

(Login as admin, sudo adduser me sudo, su - me)

sudo apt-get update
sudo apt-get install git build-essential python-dev python-virtualenv
git clone https://github.com/honnibal/spaCy
cd spaCy
virtualenv .env && source .env/bin/activate
export PYTHONPATH=`pwd`
pip install -r requirements.txt
python setup.py build_ext --inplace

In your output above, what does this mean:

.env/include:
python2.7@

Is that a symlink or something? Because, it's supposed to be a directory with a bunch of header files.

pquentin commented 9 years ago

This appears to be the output of ls -RF. It means that the directory .env/include only contains "python2.7", which is a symlink to something else.

honnibal commented 9 years ago

Thanks.

That's correct then --- I just checked and mine's a symlink too.

wohali commented 9 years ago
$ ls -la .env/include/
total 8
drwxr-xr-x 2 joant joant 4096 Sep 22 17:56 ./
drwxr-xr-x 6 joant joant 4096 Sep 22 17:56 ../
lrwxrwxrwx 1 joant joant   22 Sep 22 17:56 python2.7 -> /usr/include/python2.7/

That directory includes all of the python 2.7 files.

Not sure what to say. You're still welcome to an account on the box to poke around if you wish, drop me an email at joant@atypical.net.

wohali commented 9 years ago

Aha! If your script is trying to copy header files to .env/include, it will fail because that's a symlink to /usr/include/python2.7, writable only by root:

drwxr-xr-x 2 root root 4096 Sep  8 01:56 /usr/include/python2.7/

I'm cueing this off of:

It doesn't create an include/ directory within the env directory, which is where my install scripts are trying to install the Murmurhash headers.

Did you only try your test build on your instance as a root-capable user, perhaps?

wohali commented 9 years ago

Shoot, I thought I was close. I added an --always-copy flag to the virtualenv build, but still failure at the same place.

FYI, after running that script:

$ pwd
/home/joant/spaCy
$ find . -name MurmurHash3.h
./.env/lib/python2.7/site-packages/murmurhash/headers/murmurhash/MurmurHash3.h

I read your setup.py and I see that your headers_workaround isn't being called because use_cython is true (we're calling python setup.py build_ext). So I'm not sure how this can work on your instance at all.....starting to get very confused here.

honnibal commented 9 years ago

The directory include/ should be physical, but it's okay if the python2.7 within it is a symbolic link.

I think it makes sense to log into your box, if that's still a workable option for you. Email me on honnibal@gmail.com . In the meantime, and for anyone else reading, here's a brain-dump of what I know about these issues.

The deal with headers_workaround is this: there are two arguments of setup(), install_requires and setup_requires. This is supposed to allow you to use a dependency required by the setup.py. For instance, we need the MurmurHash headers to compile the library thinc, preshed and spacy.

setuptools reads the setup_requires values, and downloads them locally, to import them. Unfortunately, there's a long-standing bug: when it then goes to read install_requires, the import of the libraries downloaded during setup_requires succeeds! These libraries then fail to be installed in the virtualenv, because setuptools thinks they're already present.

The upshot of all this is that you can't actually specify the same library in both install_requires and setup_requires! It doesn't work. I gave up hoping anyone would sort out this mess, and made my own workaround: I made a little package, headers_workaround, which I could specify as a setup_requires dependency. This copies the headers into the virtualenv. It copies two batches of headers: the MurmurHash headers, and the numpy headers. The same mechanism is used for my hash-table library, preshed, and my averaged perceptron library, thinc (which depends on preshed).

Note that this only applies to installation via pip --- that's the only time the dependency specifications matter. So, if you're running

python setup.py build_ext --inplace

You need to have all the dependencies installed already. But, the headers_workaround is still important, because it's being used in preshed and thinc. You can't simply specify the order of the libraries in requirements.txt: the order is ignored. So the headers_workaround thing is still being used there.

However. I'm not actually so sure the headers_workaround is the culprit here. The initial error looks a lot like an error I had to deal with in old versions of setuptools. Old versions of setuptools used to have this "feature" that detected files ending in .pyx, checked whether Pyrex (the pre-decessor of Cython) was installed, and if it wasn't, attempted to compile a .c file, whether or not one existed. This prevented Cython from being run, and made it impossible to supply the .pyx source files in the package. The thing that makes me suspect this or a similar issue is, your log says that it's trying to compile murmurhas/mrmr.c, when the correct file would be murmurhash/mrmr.cpp. I can't see why it would be trying to compile a .c file, apart from this issue.

We established that you're running the correct version of virtualenv and setuptools, but things can sometimes be misleading? virtualenv is quite flakey, and sometimes I've had issues where old versions of libraries are being run unexpectedly.

honnibal commented 8 years ago

These issues should finally be fixed.

We now have a redesigned build process, and CI running on 32 bit and 64 bit builds of Linux and Windows. The new version should be released today.

So, hopefully this shouldn't reoccur. Please reopen if you still experience problems, and thanks again for your patience and your time spent debugging this for us.

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.