If there are multiple white space characters between tokens, Tokenizer will raise a warning and the entity will not be extracted. It looks like stanza does not treat the extra white space as a token, while spaCy would.
import stanza
from spacy_stanza import StanzaLanguage
snlp = stanza.Pipeline(lang='en')
nlp = StanzaLanguage(snlp)
text = "There are two spaces between these words"
doc = nlp(text)
>>> UserWarning: Can't set named entities because the character offsets don't map to valid tokens produced by the Stanza tokenizer:
Words: ['There', 'are', 'two', 'spaces', 'between', 'these', 'words']
Entities: [('two', 'CARDINAL', 12, 15)]
print(len(doc.ents)) >>> 0
If there are multiple white space characters between tokens,
Tokenizer
will raise a warning and the entity will not be extracted. It looks like stanza does not treat the extra white space as a token, while spaCy would.