Important: please see https://github.com/natasha/slovnet#morphology-1
Morphological analyzer (POS tagger) for Russian and English languages based on neural networks and dictionary-lookup systems (pymorphy2, nltk).
Domain | Full tag | PoS tag | F.t. + lemma | Sentence f.t. | Sentence f.t.l. |
---|---|---|---|---|---|
Lenta (news) | 96.31% | 98.01% | 92.96% | 77.93% | 52.79% |
VK (social) | 95.20% | 98.04% | 92.06% | 74.30% | 60.56% |
JZ (lit.) | 95.87% | 98.71% | 90.45% | 73.10% | 43.15% |
All | 95.81% | 98.26% | N/A | 74.92% | N/A |
Dataset | Full tag | PoS tag | F.t. + lemma | Sentence f.t. | Sentence f.t.l. |
---|---|---|---|---|---|
UD EWT test | 91.57% | 94.10% | 87.02% | 63.17% | 50.99% |
Speed: from 200 to 600 words per second using CPU.
Memory consumption: about 500-600 MB for single-sentence predictions
pip install rnnmorph
from rnnmorph.predictor import RNNMorphPredictor
predictor = RNNMorphPredictor(language="ru")
forms = predictor.predict(["мама", "мыла", "раму"])
print(forms[0].pos)
>>> NOUN
print(forms[0].tag)
>>> Case=Nom|Gender=Fem|Number=Sing
print(forms[0].normal_form)
>>> мама
print(forms[0].vector)
>>> [0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1]