Closed mpsota closed 4 years ago
Hmm, this is a problem with the default tag map for Polish. It wasn't validated properly because we never trained a model using this tag set, but the XPOS tags from the stanza model trigger some of the invalid mappings in an intermediate step even though it's only going to override these values with the XPOS and UPOS tags from the stanza model in the end anyway.
I think the simplest workaround is to modify the tag map in the model to remove all the mappings:
import stanza
from spacy_stanza import StanzaLanguage
snlp = stanza.Pipeline(lang='pl')
nlp = StanzaLanguage(snlp)
# remove all mappings
for tag, attrs in nlp.vocab.morphology.tag_map.items():
nlp.vocab.morphology.tag_map[tag] = {}
doc = nlp('To jest błąd')
You can also make changes in the default tag map (in spacy/lang/pl/tag_map.py
) and install spacy from source, but that is probably more work than the solution above.
If you do want to fix the tag map for spacy v2, you need to know that it requires a slightly unusual encoding of Person
values (as the strings one/two/three
instead of 1/2/3/
), but this restriction is going to be removed in spacy v3, so it's not worth putting much effort in it here. I'll try to validate all the tag maps for the next patch release of v2.3 so people don't run into weird behavior like this.
Thank you. I agree fixing the tag map is not worth putting an effort, the workaround is fine for me!
I've successfully run spacy-stanza example for english. However I can't get it working with Polish
Above works, however many other fails:
Is this because there is no "NER" processor for Polish in Stanza? Is there any easy fix to make it working?