explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
723 stars 59 forks source link

Which model actually is working? #23

Closed lingvisa closed 4 years ago

lingvisa commented 4 years ago

For this example, it seems the pipeline contains NE models both from stanfordNLP and spacy. How do I know which model is actually producing results? Does this spacy model overwrites the stanfordNLP model due to nlp.add_pipe(ner)?

snlp = stanfordnlp.Pipeline(lang="en", models_dir="./models")
nlp = StanfordNLPLanguage(snlp)

# Load spaCy's pre-trained en_core_web_sm model, get the entity recognizer and
# add it to the StanfordNLP model's pipeline
spacy_model = spacy.load("en_core_web_sm")
ner = spacy_model.get_pipe("ner")
nlp.add_pipe(ner)

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
print([(ent.text, ent.label_) for ent in doc.ents])
# [('Barack Obama', 'PERSON'), ('Hawaii', 'GPE'), ('2008', 'DATE')]
ines commented 4 years ago

StanfordNLP (not to be confused with CoreNLP) has no NER component. So the component that runs here is spaCy's. The example was meant to demonstrate that you can combine this wrapper with other arbitrary spaCy components, including custom pipeline components. The StanfordNLP wrapper will create no components in the pipeline, since it can only run jointly together with the tokenization.

StanfordNLP will run whichever processors you enable when you create the pipeline – see the docs for details: https://stanfordnlp.github.io/stanfordnlp/pipeline.html

lingvisa commented 4 years ago

Good to know and thanks, Ines.