explosion / spacy-stanza

💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
MIT License
723 stars 59 forks source link

Nice Idea but without NER not that useful #12

Closed askhogan closed 5 years ago

askhogan commented 5 years ago

Spacy is great for visualizations and the work done with Prodigy. But its NER engine comes no where close to CoreNLP.

honnibal commented 5 years ago

I'd like to uh...gently suggest you rethink what value you're hoping to provide by opening this issue.

A team of current PhD students at Stanford (especially Timothy Dozat and Peng Qi) have produced neural network parsing models that have surpassed all of the industry labs in accuracy, achieving top scores on the relevant shared tasks. This wrapper lets you use those models with spaCy's API.

If you want to use CoreNLP's NER model, by all means go ahead. The CoreNLP NER model was much better than spaCy 1.0's NER model. As of spaCy v2.0, we do outperform CoreNLP's NER model on the benchmark datasets, but I'm aware of some datasets where CoreNLP scores much better than we do. They use a linear model and a CRF decoding, which is significantly different from our algorithm.

But, all that said: It doesn't have anything to do with the purpose of this package.

askhogan commented 5 years ago

@honnibal How does it have nothing to do with this package? You title the package as spacy-stanfordnlp and then within it you provide no support for NER. Which is one of the most useful functions of stanfordnlp. Unless I am reading the docs wrong. Where is the NER?

askhogan commented 5 years ago

Using this wrapper, you'll be able to use the following annotations, computed by your pretrained stanfordnlp model:

Statistical tokenization (reflected in the Doc and its tokens) Lemmatization (token.lemma and token.lemma) Part-of-speech tagging (token.tag, token.tag, token.pos, token.pos) Dependency parsing (token.dep, token.dep, token.head) Sentence segmentation (doc.sents)

Where is Named Entity Recognition? https://stanfordnlp.github.io/CoreNLP/ner.html

Also SpaCy's own website says specifically state of the art comes with CORENLP not SpaCy

https://cl.ly/285a4edaf7a5/Image%202019-06-06%20at%207.32.06%20PM.png