explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.89k stars 4.38k forks source link

Model deserialization and Windows installation issues #1421

Closed parvathysarat closed 7 years ago

parvathysarat commented 7 years ago

I installed spacy using pip and have been trying to download the language models. However $ python -m spacy download es yields

Compatibility error
No compatible model found for 'es_core_web_md' (spaCy v2.0.0a17).

While trying to download the English model, $ pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-1.2.0/en_core_web_sm-1.2.0.tar.gz as well as $ python -m spacy download en yield errors

Collecting https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0a7/en_core_web_sm-2.0.0a7.tar.gz Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0a7/en_core_web_sm-2.0.0a7.tar.gz (36.4MB) Exception: Traceback (most recent call last): File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\basecommand.py", line 215, in main status = self.run(options, args) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\commands\install.py", line 324, in run requirement_set.prepare_files(finder) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\req\req_set.py", line 380, in prepare_files ignore_dependencies=self.ignore_dependencies)) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\req\req_set.py", line 620, in _prepare_file session=self.session, hashes=hashes) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\download.py", line 821, in unpack_url hashes=hashes File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\download.py", line 659, in unpack_http_url hashes) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\download.py", line 882, in _download_http_url _download_url(resp, link, content_file, hashes) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\download.py", line 605, in _download_url consume(downloaded_chunks) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\utils__init.py", line 852, in consume deque(iterator, maxlen=0) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\download.py", line 571, in written_chunks for chunk in chunks: File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\utils\ui.py", line 139, in iter for x in it: File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip\download.py", line 560, in resp_read decode_content=False): File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip_vendor\requests\packages\urllib3\response.py", line 357, in stream data = self.read(amt=amt, decode_content=decode_content) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip_vendor\requests\packages\urllib3\response.py", line 324, in read flush_decoder = True File "C:\Users\PARVATHY SARAT\Anaconda2\lib\contextlib.py", line 35, in exit__ self.gen.throw(type, value, traceback) File "C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\pip_vendor\requests\packages\urllib3\response.py", line 246, in _error_catcher raise ReadTimeoutError(self._pool, None, 'Read timed out.') ReadTimeoutError: HTTPSConnectionPool(host='github-production-release-asset-2e65be.s3.amazonaws.com', port=443): Read timed out.

$ python -m spacy.en.download all C:\Users\PARVATHY SARAT\Anaconda2\python.exe: cannot import name fix_glove_vectors_loading

What do these issues mean and how do I rectify them? I tried downloading both using office Wifi and home Wifi so not so sure if it's a connection error. OS: Windows 10 Thanks a lot in advance.

ines commented 7 years ago

Hey! I think the first problem is quite simple: Since we've only recently added more models to spaCy v2.0 alpha, the shortcut es doesn't yet point to the correct model – e.g. it's trying to load the es_core_web_md model, whereas the new Spanish model is a sm model. (This will definitely be fixed for the stable release, when we know which models are going to be available.)

In the meantime, you can simply download the model explicitly, e.g.:

spacy download es_core_web_sm

You can find the exact download command in the alpha models directory, in the right sidebar next to each model listing.

About the second error: Hmm, this looks like the server timed out, so this might indeed have something to do with your connection. Are you able to download the model archive file directly via your browser using the link: https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0a7/en_core_web_sm-2.0.0a7.tar.gz The model is only around 35 MB. Once you've downloaded the file, you can just install the model from the local path, for example:

pip install /path/to/downloads/en_core_web_sm-2.0.0a7.tar.gz

# link model to the shortcut 'en' – normally, this is performed by "spacy download", 
# but since you're installing it manually, you need to do this yourself if you want to 
# load the model as "en"
spacy link en_core_web_sm en 

The python -m spacy.en.download all has been deprecated since v1.7.0 (and only points to the old models for spaCy v1.6.0 anyways). So this definitely won't work. (It should say this in the error message though, so there might be a bug here – will check and fix this if necessary.)

parvathysarat commented 7 years ago

Hey, thanks a ton replying. I was able to install the English model using $ python -m spacy download en which after downloading gave me the message You can now load the model via spacy.load('en') Using IPython,

 import spacy
 nlp=spacy.load('en')
[AttributeError                            Traceback (most recent call last)
<ipython-input-5-a32b6d2b36d8> in <module>()
----> 1 nlp=spacy.load('en')

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\__init__.pyc in load(n
ame, **overrides)
     13     from .deprecated import resolve_load_name
     14     name = resolve_load_name(name, **overrides)
---> 15     return util.load_model(name, **overrides)
     16
     17

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
(name, **overrides)
    102     if isinstance(name, basestring_):
    103         if name in set([d.name for d in data_path.iterdir()]): # in data
 dir / shortcut
--> 104             return load_model_from_link(name, **overrides)
    105         if is_package(name): # installed as package
    106             return load_model_from_package(name, **overrides)

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
_from_link(name, **overrides)
    121             "Cant' load '%s'. If you're using a shortcut link, make sure
 it "
    122             "points to a valid model package (not just a data directory)
." % name)
--> 123     return cls.load(**overrides)
    124
    125

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\data\en\__init__.pyc i
n load(**overrides)
     10
     11 def load(**overrides):
---> 12     return load_model_from_init_py(__file__, **overrides)

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
_from_init_py(init_file, **overrides)
    165     if not model_path.exists():
    166         raise ValueError("Can't find model directory: %s" % path2str(dat
a_path))
--> 167     return load_model_from_path(data_path, meta, **overrides)
    168
    169

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in load_model
_from_path(model_path, meta, **overrides)
    148             component = nlp.create_pipe(name, config=config)
    149             nlp.add_pipe(component, name=name)
--> 150     return nlp.from_disk(model_path)
    151
    152

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\language.pyc in from_d
isk(self, path, disable)
    571         if not (path / 'vocab').exists():
    572             exclude['vocab'] = True
--> 573         util.from_disk(path, deserializers, exclude)
    574         return self
    575

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in from_disk(
path, readers, exclude)
    495     for key, reader in readers.items():
    496         if key not in exclude:
--> 497             reader(path / key)
    498     return path
    499

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\language.pyc in <lambd
a>(p)
    558         path = util.ensure_path(path)
    559         deserializers = OrderedDict((
--> 560             ('vocab', lambda p: self.vocab.from_disk(p)),
    561             ('tokenizer', lambda p: self.tokenizer.from_disk(p, vocab=Fa
lse)),
    562             ('meta.json', lambda p: p.open('w').write(json_dumps(self.me
ta)))

vocab.pyx in spacy.vocab.Vocab.from_disk()

vectors.pyx in spacy.vectors.Vectors.from_disk()

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\spacy\util.pyc in from_disk(
path, readers, exclude)
    495     for key, reader in readers.items():
    496         if key not in exclude:
--> 497             reader(path / key)
    498     return path
    499

vectors.pyx in spacy.vectors.Vectors.from_disk.load_keys()

C:\Users\PARVATHY SARAT\Anaconda2\lib\site-packages\numpy\lib\npyio.pyc in load(
file, mmap_mode, allow_pickle, fix_imports, encoding)
    389         _ZIP_PREFIX = asbytes('PK\x03\x04')
    390         N = len(format.MAGIC_PREFIX)
--> 391         magic = fid.read(N)
    392         fid.seek(-N, 1)  # back-up
    393         if magic.startswith(_ZIP_PREFIX):

AttributeError: 'WindowsPath' object has no attribute 'read']

I have the en and es models downloaded in my working directory, what does this error message mean? Thansk again!

ines commented 7 years ago

Ah, sorry you're having so many problems!

I have the en and es models downloaded in my working directory, what does this error message mean?

Interesting, this looks like an issue during deserialization, i.e. when the binary data of the model is loaded. Down the line in numpy, it seems to be unable to load from a WindowsPath... I'm surprised this hasn't come up before!

Just had a look at how this is handled in the numpy source and it doesn't detect the WindowsPath as a Path and thus doesn't open the file. So in spaCy, I think we should be able to prevent this problem by always passing in a string for path / key to the reader. (Sorry for the dump of random specifics, just writing this down for the bugfix. I've already made the change on develop and it will be included in the next alpha release).

There's not really and easy solution or workaround for you at the moment – except for switching to Python 3.6, which shouldn't have this problem.

parvathysarat commented 7 years ago

Okay, thanks! Uninstalling and installing all of it again now. Btw, do you think not properly installing the Microsoft Visual C++ 14.0 could have been the issue? I got an error to install it initially, which I did, and then I was able to install spacy. But I think I may have installed a leaner/improper version of it the first time, would that cause the error : WindowsPath' object has no attribute 'read'?

ines commented 7 years ago

In this case, I'd say it's unlikely – but then again, with Windows compilers, you never really know. Unfortunately, this stuff is still pretty tricky, and probably the number one source of issues for Windows users. (So it's good to keep this in mind in case you end up having more problems later on.)

Providing spaCy on conda has made a big difference, though – so once spaCy v2.0.0 stable is released, you'll also be able to download and install it straight from there.

parvathysarat commented 7 years ago

Guess I'll have to see what I can do until it's out for downloads. After switching to Python 3.6, I was able to download spacy using Visual C++ command prompt. But now back to error for downloading English model - command 'cl.exe' failed: No such file or directory

I'm sure there are others who have been successful doing this? Too much energy and hopes on spacy,need to solve this somehow.

ines commented 7 years ago

I just did a quick search for that error and found this thread on StackOverflow: https://stackoverflow.com/questions/41724445/python-pip-on-windows-command-cl-exe-failed

It has some solutions and an accepted answer, so maybe this is helpful? The problem seems common enough, so there are also several other threads on (likely) the same issue.

parvathysarat commented 7 years ago

I tried and retried the solutions suggested for cl.exe issue, but other than new/old errors cropping up I couldn't progress. Hence I've switched to Ubuntu! I could import spacy (Python 2.7) until I downloaded and installed (the way it was mentioned above by you) the English model. Now the error seems to be

import spacy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/parvathy/.local/lib/python2.7/site-packages/spacy/__init__.py", line 10, in <module>
    from . import en, de, zh, es, it, hu, fr, pt, nl, sv, fi, bn, he, nb, ja
  File "/home/parvathy/.local/lib/python2.7/site-packages/spacy/en/__init__.py", line 4, in <module>
    from ..language import Language
  File "/home/parvathy/.local/lib/python2.7/site-packages/spacy/language.py", line 14, in <module>
    from .pipeline import DependencyParser, EntityRecognizer
  File "spacy/pipeline.pyx", line 1, in init spacy.pipeline (spacy/pipeline.cpp:16536)
  File ".env/lib/python2.7/site-packages/thinc/extra/search.pxd", line 72, in init spacy.syntax.beam_parser (spacy/syntax/beam_parser.cpp:20037)
ValueError: thinc.extra.search.MaxViolation has the wrong size, try recompiling

I tried sudo pip install thinc==6.8.1 followed by installing the model again but the error persists. Any thoughts? Thanks in adv, always.

ines commented 7 years ago

The correct Thinc version for spaCy nightly 2.0.0a17 is definitely 6.9.0 – so if this is the combination you have and you've installed everything from scratch in a clean environment on Ubuntu, and used the latest version of the model, this should all be fine. Sorry it still isn't working – after all the stress so far, you definitely deserve better!

To help us debug, could you post the result of spacy info --markdown? And just to be safe, when you run spacy validate on the command line, does it show all models as green and up to date?

parvathysarat commented 7 years ago

It worked after I redid the whole thing in a virtual environment! Thanks a lot for the help :)

ines commented 7 years ago

Yessssss! 🎉🙏

jianzhengming commented 6 years ago

I have some related issues, I download the 'en' model, $ python3 -m spacy download en, yield Linking sucessful /home/abc/miniconda3/lib/python3.6/site-packages/en_core_web_sm --> /home/abc/miniconda3/lib/python3.6/site-packages/spacy/data/en

You can now load the model via spacy.load('en')

However, when I use it $nlp=spacy.load('en') I still get the information "OSError: Can't find model 'en'"

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.