bellingcat / cisticola

Coordinates scrapers and interfaces with database
17 stars 1 forks source link

Fix spacy warning #63

Open trislee opened 2 years ago

trislee commented 2 years ago

On tests, I'm getting the warning:

/home/work/.venv/cisticola/lib/python3.9/site-packages/spacy/pipeline/lemmatizer.py:211: UserWarning: [W108] The rule-based lemmatizer did not find POS annotation for one or more tokens. Check that your pipeline includes components that assign token.pos, typically 'tagger'+'attribute_ruler' or 'morphologizer'.

Based on (this post)[https://stackoverflow.com/a/66452416] this error could be fixed by adding the "lemmatizer" term to the disabled list when loading language pipelines, e.g. nlp_en = spacy.load('en_core_web_sm', disable=['parser', 'tok2vec', 'attribute_ruler', 'lemmatizer']) but I'm not sure if/how that will affect the nlp results.