explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
29.82k stars 4.37k forks source link

NER training warning [W033] after spacy-lookups-data loaded #6225

Closed fcggamou closed 4 years ago

fcggamou commented 4 years ago

I'm getting "UserWarning: [W033] Training a new parser or NER using a model with no lexeme normalization table. This may degrade the performance of the model to some degree. If this is intentional or the language you're using doesn't have a normalization table, please ignore this warning. If this is surprising, make sure you have the spacy-lookups-data package installed. The languages with lexeme normalization tables are currently: da, de, el, en, id, lb, pt, ru, sr, ta, th."

Even after loading the spacy-lookups-data, which contains the spanish lemmatization table. The lemmatization seems to be working, if I try e.g.:

nlp = spacy.load('es_core_news_md')
for tok in nlp('comiendo'): 
  print(tok.lemma_)

I correctly get the result comer

How to reproduce the behaviour

!pip install spacy-lookups-data
!pip install spacy==2.3.1
!python -m spacy train 'es' 'my_ner' 'train.json' 'test.json' --base-model='es_core_news_md' --pipeline='ner' -g=0 -ne=3 -n=500 -nl=0.05 -R

Your Environment

adrianeboyd commented 4 years ago

This warning is really too long and detailed. It's related to the normalization table, not the lemmatizer tables. There isn't a Spanish normalization table (it's not in the list of languages at the end of the warning), so what you have is fine. You might still want spacy-lookups-data for the lemmatizer tables if you want the lookup lemmas in your model.

fcggamou commented 4 years ago

Thanks a lot for clarifying and sorry for the confusion, didn't know the distinction between normalization table and lemmatizer table.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.