explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
726 stars 60 forks source link

Rewrite alignment to preserve whitespace tokens #41

Closed adrianeboyd closed 4 years ago

adrianeboyd commented 4 years ago

Rewrite the alignment algorithm to create the words and spaces using a copy of spacy.util.get_words_and_spaces and align the stanza annotation to the words, adjusting the positions and offsets around the additional whitespace tokens.

Fixes #30, fixes #33.