aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.29k stars 337 forks source link

Downloader: Russian collection is not complete #180

Open burtsev-cpu opened 5 years ago

burtsev-cpu commented 5 years ago

Win 7-64, Python 3.7.2 While downloading Russian collection via..

def use_polyglot(stb):
    downloader.supported_languages()
    downloader.download('LANG:ru')

I'm getting such a response: [polyglot_data] Downloading collection 'LANG:ru' [polyglot_data] | [polyglot_data] | Downloading package to C:\Users\Александр\AppDat [polyglot_data] | a\Roaming\polyglot_data... [polyglot_data] | Package is already up-to-date! [polyglot_data] | [polyglot_data] Done downloading collection LANG:ru Traceback (most recent call last): File "C:\Miniconda3\Stolen_bikes\Velorozysk_parser_v2.py", line 93, in main() File "C:\Miniconda3\Stolen_bikes\Velorozysk_parser_v2.py", line 87, in main use_polyglot(stb) File "C:\Miniconda3\Stolen_bikes\Velorozysk_parser_v2.py", line 68, in use_polyglot print(text.entities) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\decorators.py", line 20, in get value = obj.dict[self.func.name] = self.func(obj) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\text.py", line 132, in entities for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)): File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\decorators.py", line 20, in get value = obj.dict[self.func.name] = self.func(obj) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\text.py", line 100, in ne_chunker return get_ner_tagger(lang=self.language.code) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\decorators.py", line 30, in memoizer cache[key] = obj(*args, *kwargs) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\tag\base.py", line 191, in get_ner_tagger return NEChunker(lang=lang) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\tag\base.py", line 104, in init super(NEChunker, self).init(lang=lang) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\tag\base.py", line 40, in init self.predictor = self._load_network() File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\tag\base.py", line 109, in _load_network self.embeddings = load_embeddings(self.lang, type='cw', normalize=True) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\decorators.py", line 30, in memoizer cache[key] = obj(args, **kwargs) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\load.py", line 61, in load_embeddings p = locate_resource(src_dir, lang) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\load.py", line 43, in locate_resource if downloader.status(package_id) != downloader.INSTALLED: File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\downloader.py", line 737, in status info = self._info_or_id(info_or_id) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\downloader.py", line 507, in _info_or_id return self.info(info_or_id) File "C:\Miniconda3\envs\icutestenv\lib\site-packages\polyglot\downloader.py", line 933, in info raise ValueError('Package %r not found in index' % id) ValueError: Package 'embeddings2.ru' not found in index.

Seems like the collection is not complete. Solutions discussed in #26 are not helpfull in my case (access forbidden). Are there any other ways of downloading?