Lambda-3 / PyCobalt

Coreference Resolution in Python
8 stars 3 forks source link

Replace CoreNLP with spaCy #3

Open bermeitinger-b opened 7 years ago

bermeitinger-b commented 7 years ago

Starting the CoreNLP server is not nice for anyone, it is big, relatively slow and the usage is a bit clunky. Other options are either spaCy or nltk.

First experiments show that nltk's Named Entity Recognition is not very accurate and the sentence splitter is worse than CoreNLP. The next choice is spaCy which shows nice results from simple experiments. Before we implement, we have to check the following:

leonardossz commented 7 years ago

Are you talking about this: https://stanfordnlp.github.io/CoreNLP/corenlp-server.html ?

bermeitinger-b commented 7 years ago

Yes. PyCobalt currently uses CoreNLP as the NLP tool for POS-tagging and NER. Running it is clunky. It is cumbersome to start the CoreNLP server even if using docker. With spaCy all code is directly in Python. This benchmark shows the superiority in speed. NER is slightly worse. Without CoreNLP, PyCobalt could be published as a "simple" Python module.

swathimithran commented 7 years ago

Great Move, I think Spacy is will be much better than CoreNLP. I am eagerly waiting for this update. Please let me know if you need any help.

bermeitinger-b commented 7 years ago

I'm sorry for raising expectations about the implementation and the timeline. This issue was meant to be a reminder for me, if I have time in the future. This won't be resolved this or next month. We would be happy to accept a pull request, though.

swathimithran notifications@github.com schrieb am Do., 27. Juli 2017, 11:05:

Great Move, I think Spacy is will be much better than CoreNLP. I am eagerly waiting for this update. Please let me know if you need any help.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Lambda-3/PyCobalt/issues/3#issuecomment-318304187, or mute the thread https://github.com/notifications/unsubscribe-auth/AAjeiURYoUvqWdg8XIQ74DuUoFoHtQfTks5sSFLsgaJpZM4OjtYO .

--

Universität Passau Bernhard Bermeitinger, M.Sc. Wissenschaftlicher Mitarbeiter Fakultät für Informatik und Mathematik Lehrstuhl für Informatik mit Schwerpunkt Digital Libraries and Web Information Systems Innstraße 43, ITZ/IH 112 94032 Passau +49-(0)851/509-3394 bernhard.bermeitinger@uni-passau.de http://www.fim.uni-passau.de/digital-libraries/