explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
723 stars 59 forks source link

Morphological features are lost in russian model #28

Closed SergeyShk closed 11 months ago

SergeyShk commented 4 years ago

spaCy version: 2.1.9 spaCy-stanza version: 0.2.1

import stanza
from spacy_stanza import StanzaLanguage

stanza.download('ru')
snlp = stanza.Pipeline(lang="ru")
nlp = StanzaLanguage(snlp)
text = "Мама мыла раму"

Using stanza, i get this:

for sentence in snlp(text).senteces:
    for word in sentence.words:
        print(word.feats)

# Animacy=Anim|Case=Nom|Gender=Fem|Number=Sing
# Animacy=Inan|Case=Gen|Gender=Neut|Number=Sing
# Animacy=Inan|Case=Acc|Gender=Fem|Number=Sing

Using spac-stanza, i get this:

for token in nlp(text):
    print(token.tag_)

# 
# 
# 

But other annotations (such as lemma, pos, dep etc.) are available.

ines commented 4 years ago

Thanks for the report! It looks like the wrapper currently only uses the token.xpos for the tag, which seems to not exist here. (Or maybe it changed in Stanza update and I missed it.) Adding (or appending?) the token.feats here should fix it:

https://github.com/explosion/spacy-stanza/blob/58f562726aaf1c5f7e72e5313e7aece46bd33d7d/spacy_stanza/language.py#L157

SergeyShk commented 4 years ago

Yes, xpos does not exist in the Russian models. Can you fix this in the new version? https://github.com/explosion/spacy-stanza/pull/29

fingoldo commented 3 years ago

Any update on this? stanza still seems to be missing xpos for Russian.

adrianeboyd commented 11 months ago

Just going through older issues...

The default ru models don't contain xpos, so for now spacy-stanza uses upos for token.tag if there's no xpos available. I think some of the non-default Russian models do contain xpos (I think Russian GSD looks like it does?).

Please feel free to reopen if you're still running into issues!