explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.18k stars 4.4k forks source link

Broken import for Icelandic language data #8214

Open elisno opened 3 years ago

elisno commented 3 years ago

How to reproduce the behaviour

It looks like importing language data for Icelandic is broken. E.g. to get stop words:

# This works
import spacy.lang.en.stop_words
spacy.lang.en.stop_words.STOP_WORDS

# Syntax error in import statement
import spacy.lang.is.stop_words
spacy.lang.is.stop_words.STOP_WORDS

Error:

>>> import spacy.lang.is.stop_words
  File "<stdin>", line 1
    import spacy.lang.is.stop_words
                      ^
SyntaxError: invalid syntax

I have yet to test this on Spacy v3.0.

Your Environment

elisno commented 3 years ago

Could this be resolved by referring to the language data directory with the three-letter country code?

spacy/lang/is -> spacy/lang/isl
svlandeg commented 3 years ago

Thanks for the report! We'll have to find a workaround, indeed.

I'm a little surprised nobody's run into this before!

elisno commented 3 years ago

Another workaround for this case is to use importlib:


import importlib
lang_is = importlib.import_module("spacy.lang.is")
lang_is.stop_words.STOP_WORDS
BLKSerene commented 3 years ago

In my project, I need to fetch stop words of all languages provided by spaCy, so I have to use the importlib way with f-string and did not run into this issue. Using three-letter code for only the Icelandic language (which has a two-letter ISO 639-1 code) would be inconsistent, or spaCy could use three-letter codes for all languages.