explosion / spaCy

šŸ’« Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.87k stars 4.38k forks source link

PyInstaller cannot package application with spaCy #2536

Closed lemoncalamitous closed 6 years ago

lemoncalamitous commented 6 years ago

How to reproduce the problem

I am currently using conda. I've created a clean environment with the following yml file:

name: *environment name*
channels:
- defaults
dependencies:
- astroid=1.5.3=py35_0
- coverage=4.4.1=py35_0
- cycler=0.10.0=py35_0
- cython=0.26=py35_0
- graphviz=2.38.0=4
- icu=57.1=vc14_0
- isort=4.2.15=py35_0
- jpeg=9b=vc14_0
- lazy-object-proxy=1.3.1=py35_0
- libpng=1.6.30=vc14_1
- matplotlib=2.0.2=np113py35_0
- mkl=2017.0.3=0
- nltk=3.2.4=py35_0
- numpy=1.13.1=py35_0
- openssl=1.0.2l=vc14_0
- pandas=0.20.3=py35_0
- pip=9.0.1=py35_1
- pycrypto=2.6.1=py35_6
- pylint=1.7.2=py35_0
- pyodbc=4.0.17=py35_0
- pyparsing=2.2.0=py35_0
- pyqt=5.6.0=py35_2
- python=3.5.4=0
- python-dateutil=2.6.1=py35_0
- pytz=2017.2=py35_0
- pywin32=220=py35_2
- qt=5.6.2=vc14_6
- requests=2.14.2=py35_0
- scikit-learn=0.19.0=np113py35_0
- scipy=0.19.1=np113py35_0
- setuptools=27.2.0=py35_1
- singledispatch=3.4.0.3=py35_0
- sip=4.18=py35_0
- six=1.10.0=py35_1
- tornado=5.0=py35_0
- tk=8.5.18=vc14_0
- vs2015_runtime=14.0.25420=0
- wheel=0.29.0=py35_0
- wrapt=1.10.11=py35_0
- zlib=1.2.11=vc14_0
- pip:
  - future==0.16.0
  - pyinstaller==3.2.1
  - pypiwin32==219
  - talon==1.4.4
  - langdetect==1.0.7
prefix: C:\ProgramData\Anaconda3\envs\*environment name*

And then tried installing spaCy 2.0.11 on that environment using pip and conda.

pip install -U spacy
and
conda install -c conda-forge spacy 

In PyInstaller, I added the following --hidden-import since they cannot be imported from compiled application:

--hidden-import cymem.cymem
--hidden-import thinc.linalg
--hidden-import murmurhash.mrmr
--hidden-import cytoolz.utils
--hidden-import cytoolz._signatures
--hidden-import spacy.strings
--hidden-import spacy.morphology
--hidden-import spacy.lexeme
--hidden-import spacy.tokens.underscore

Here's the error message that I cannot resolve:

  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 631, in exec_module
    exec(bytecode, module.__dict__)
"c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 631, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages\spacy\__init__.py", line 4, in <module>
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 631, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages\spacy\cli\__init__.py", line 6, in <module>
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 631, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages\spacy\cli\train.py", line 12, in <module>
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 714, in load_module
    module = loader.load_module(fullname)
  File "morphology.pxd", line 25, in init spacy.gold
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 714, in load_module
    module = loader.load_module(fullname)
  File "vocab.pxd", line 27, in init spacy.morphology
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 714, in load_module
    module = loader.load_module(fullname)
  File "tokens/doc.pxd", line 30, in init spacy.vocab
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 631, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages\spacy\tokens\__init__.py", line 1, in <module>
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 714, in load_module
    module = loader.load_module(fullname)
  File "token.pxd", line 12, in init spacy.tokens.doc
  File "c:\users\*my_id*\appdata\local\continuum\anaconda3\envs\*my_environment_name*\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 714, in load_module
    module = loader.load_module(fullname)
  File "token.pyx", line 15, in init spacy.tokens.token
ImportError: cannot import name parts_of_speech

I am not sure if the issue is on PyInstaller or spaCy. Any help is appreciated. Thanks!

My Environment

lemoncalamitous commented 6 years ago

I verified on this issue I am facing, this is an issue of PyInstaller. You must use PyInstaller's dev branch to compile your application.

pip install https://github.com/pyinstaller/pyinstaller/archive/develop.zip

Also, in compiling your application with spacy using PyInstaller, ypu must use hidden imports such as:

--hidden-import cymem.cymem --hidden-import thinc.linalg --hidden-import murmurhash.mrmr --hidden-import cytoolz.utils --hidden-import cytoolz._signatures --hidden-import spacy.strings --hidden-import spacy.morphology --hidden-import spacy.lexeme --hidden-import spacy.tokens --hidden-import spacy.gold --hidden-import spacy.tokens.underscore --hidden-import spacy.parts_of_speech --hidden-import dill --hidden-import spacy.tokens.printers --hidden-import spacy.tokens._retokenize --hidden-import spacy.syntax --hidden-import spacy.syntax.stateclass --hidden-import spacy.syntax.transition_system --hidden-import spacy.syntax.nonproj --hidden-import spacy.syntax.nn_parser --hidden-import spacy.syntax.arc_eager --hidden-import thinc.extra.search --hidden-import spacy.syntax._beam_utils --hidden-import spacy.syntax.ner --hidden-import thinc.neural._classes.difference

Compiling python script with spaCy to an executable became successful on my end. But, upon using spaCy, It cannot find the spaCy data directory.

[E049] Can't find spaCy data directory: 'None'. Check your installation and permissions, or use spacy.util.set_data_path to customise the location if necessary.

I use spacy.load('path to my pretrained model from a config file'). It's running on plain python, but not on the executable. Any thoughts on this?

ines commented 6 years ago

It's possible that this is a permissions / environment thing, or maybe it's because the default data directory in spacy/data wasn't included in your executable? In any case, you can always set the path manually via the set_data_path helper:

from spacy.util import set_data_path

set_data_path('/path/to/spacy/data')
lemoncalamitous commented 6 years ago

Thanks @ines for the recommendation! It looks like setting the data path from the config file to set_data_path function worked.

But upon continuing packaging my software, another error occured:

[E048] Can't import language en from spacy.lang.

My current model has a meta.json file that has the following values:

{ "lang":"en", "name":"model", "version":"0.0.0", "spacy_version":">=2.0.11", "description":"", "author":"", "email":"", "url":"", "license":"", "vectors":{ "width":300, "vectors":1250792, "keys":999994, "name":null }, "pipeline":[

] }

Any thoughts on this?

ines commented 6 years ago

That's very strange ā€“ the meta is correct. The error seems to occur here in get_lang_class when spaCy is trying to import spacy.lang.en dynamically.

Maybe some of the modules weren't packaged correctly? If you run the following code manually, does it produce the same error?

import spacy.lang.en  # import module
from spacy.lang.en import English  # import something from module

# use the get_lang_class helper
from spacy.util import get_lang_class
cls = get_lang_class('en')

You could also check that all files are where they should be ā€“ for example spacy/lang/en should exist and have an __init__.py, etc.

lemoncalamitous commented 6 years ago

Hi @ines . I tried the code you recommend and this works.

import spacy.lang.en from spacy.lang.en import English from spacy.util import get_lang_class cls = get_lang_class('en') cls <class 'spacy.lang.en.English'>

Based on my observation, it seems that (again) it's PyInstaller's fault. I use 4 fastText models that all have English meta, so I tried adding hidden import again.

--hidden-import spacy.lang.en

Upon packaging my application along with so many hidden imports, it worked, finally! If I am going to use other languages, creating hook-spacy.lang.py then importing all languages will work better. Just place this hook Python file inside your Python environment's PyInstaller hooks folder.

from PyInstaller.utils.hooks import collect_submodules

hiddenimports = collect_submodules('spacy.lang')

Hope that this will also help other guys who use spaCy on their packaged application. Thanks for the support!

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.