Closed Motorrat closed 5 years ago
Hey @Motorrat , thanks for reporting this. I'm honestly not sure what most users would want in the case that automatic language detection fails (e.g. returns "un" for "unknown") and they don't know a priori what the language should be. It seems like
try:
doc = textacy.Doc(text)
except IOError:
# do something else
would be a safer bet than applying a default language model to whatever text fails language detection. This also works for languages that are correctly detected but don't have models available, right?
Would it be sufficient to mention this somewhere in the docs, and point folks to the external spacy model download/linking documentation?
Just added details to the docs! Hope that helps.
Textacy fails to detect language of some text and makes spacy throwing an exception for an absent model name. A fix seems not to be documented yet.
Expected Behavior
Tell the user about the possiblity to create a symlink for 'un' language in the documentation for example
ln -s /home/user/venv/full/lib/python3.6/site-packages/de_core_news_sm /home/user/venv/full/lib/python3.6/site-packages/spacy/data/un
or take adefault_lang
parameter in thetextacy.Doc(text)
signature -textacy.Doc(text, default_lang='de')
Current Behavior
Currently it fails with the follwing stack trace:
doc = textacy.Doc(text) File "/home/user/venv/full/lib/python3.6/site-packages/textacy/doc.py", line 114, in __init__ self._init_from_text(content, metadata, lang) File "/home/user/venv/full/lib/python3.6/site-packages/textacy/doc.py", line 136, in _init_from_text spacy_lang = cache.load_spacy(langstr) File "/home/user/venv/full/lib/python3.6/site-packages/cachetools/__init__.py", line 46, in wrapper v = func(*args, **kwargs) File "/home/user/venv/full/lib/python3.6/site-packages/textacy/cache.py", line 99, in load_spacy return spacy.load(name, disable=disable) File "/home/user/venv/full/lib/python3.6/site-packages/spacy/__init__.py", line 21, in load return util.load_model(name, **overrides) File "/home/user/venv/full/lib/python3.6/site-packages/spacy/util.py", line 119, in load_model raise IOError(Errors.E050.format(name=name)) OSError: [E050] Can't find model 'un'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Context
I'm working with a collection of documents in various languages. The program stops due to an exception where I would rather proceed with falling back to the default language for that document.
Your Environment
textacy.utils.print_markdown(textacy.utils.get_config())