explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
723 stars 59 forks source link

Speed #54

Open hg2051 opened 3 years ago

hg2051 commented 3 years ago

Spacy Stanza is much slower than merely Stanza

adrianeboyd commented 3 years ago

There is some overhead in aligning the annotation and creating the spacy Doc, although I wouldn't have expected it to be that significant vs. the stanza processing time.

The main difference may be batching, though. stanza doesn't have any native support for batching (they just suggest concatenating docs with \n\n) so the spacy-stanza wrapper processes each text individually, even with nlp.pipe, since we also want to be able to process docs that contain \n\n without problems.

Can you provide more details about how you're using spacy-stanza?