buriy / spacy-ru

Russian language models for spaCy
MIT License
242 stars 29 forks source link

Comparing with Stanza Russian models #22

Closed lexmosolov closed 4 years ago

lexmosolov commented 4 years ago

Which library is better and when?

buriy commented 4 years ago

See https://github.com/natasha/naeval , they added stanza a few days ago. (Russian comments on the comparison: https://github.com/natasha/naeval/issues/1 ) TLDR: Stanza has good and fast POS and DEP, but (relatively) bad and slow NER.

lexmosolov commented 4 years ago

And what about Stanza models in Spacy pipeline? https://github.com/explosion/spacy-stanza

buriy commented 4 years ago

It's just a wrapper, as usual. You can use it to get predictions, but not train your spacy models with it.

buriy commented 4 years ago

Quality of syntax for https://github.com/buriy/spacy-ru/releases/tag/v2.3_pre1 :

data/grameval/news.json:
UAS       93.29
LAS       86.86
data/grameval/wiki.json:
UAS       83.74
LAS       72.16
data/grameval/fiction.json:
UAS       95.67
LAS       91.12
data/grameval/social.json:
UAS       79.62
LAS       70.03
data/grameval/poetry.json:
UAS       72.04
LAS       60.52

Benchmarked on https://github.com/natasha/naeval . If you would compare this to Stanza, this is very close (sometimes better, sometimes worse), but Spacy is ~3x faster.