Closed AngledLuffa closed 1 year ago
It's been a while since I ran these test but it looks like it was run against Stanford CoreNLP version 2018-10-05, aka v3.9.2. Since this is a python library I had to use SNLP's web interface and run the CoreNLP as a server. I don't think the call you're referencing is available via the html requests, hence the inability to specify the POS tags.
@bjascob , I'm also very interested in seeing an updated comparison, particularly with the latest spaCy 3.4 and corenlp 4.5.1!
If you wind up rerunning it, we can make sure there's a suitable interface for the POS tag version of Morphology
Can someone tell me how to call SNLP from Python with a word
its Penn style POS tag
and have it return the lemma
? The built-in SNLP web server interface I used previously is setup to parse an entire sentence, not to take in a single word. If SNLP has the capability of doing this, it should make for a much better test. The last time I looked (probably several years ago) I didn't see a way to do this.
BTW.. I won't have time to spend on this in the next month but after that I'm up for revising the testing if someone has a way to get this info from SNLP. A code snippet of the HTML commands to do this would be ideal. If this should be done through Stanza instead of SNLP's web interface let me know that too. If so, sample code on how to use that lib to get the info would be helpful.
Are you starting from known POS tags, or raw text you want tagged?
Starting from words with known POS tags and asking for the lemma. There are no sentences to parse in the Automatically Generated Inflection Database (AGID) used for testing.
Sounds good. I will add a Python - Java interface which allows for adding lemmas to tagged words.
Hmm, one thing that will be tricky will be that sometimes we distinguish between ADJ and ADV. It's relevant for "best", "worst", "better", and "worse". Also verb & noun forms may depend on the particular POS used. In general I think it will be okay, though
I added this to the dev branch of CoreNLP:
https://github.com/stanfordnlp/CoreNLP/commit/71bc95dfaf984f7056e0856414738be0706cf9e3
I added this to the dev branch of stanza:
https://github.com/stanfordnlp/stanza/pull/1144
I expect both to be released in later November or early December, hopefully. If you need it sooner, let us know. It uses xpos tags, but seeing as how you have just N, V, or A, I think you can get away with changing all nouns to NN, leaving all verbs as V, and changing all adjectives & adverbs to JJS if the word ends with "est" and JJR otherwise. Although this will have some weird effects on words such as "honest", not to mention the small handful of words which get treated differently if they are adjectives or adverbs.
Great. If you remember, drop a quick note in here when it's released and I'll get an email update. If not, I'll likely remember to check back in a month or so anyway.
Can I assume that the def main()
in morphology.py
is good example code on how to use this in the Stanza library (I haven't used Stanza before, just direct calls to the SNLP server).
As a note to myself --> the Stanza API takes Penn Treebank style tags and the Lemminflect inflection test corpus only has VAN tags (V, A or N). For testing, convert VAN tags to the closest PTB style tag. Consider trying all possible PTB tags for the word to verify that scores are not artificially lower due to the conversion.
Yep! That was the intent. I could also add some other interface, such as passing in tuples of (word, tag)
I went ahead and compiled the "dev" versions of stanza 1.5 and CoreNLP 4.5.2 and re-ran tests. The results were...
Stanza version: 1.5.0
119,310 total test cases where 0 had no returns.
27.0 usecs per lemma
5,440 incorrect lemmas = 95.4% accuracy
Results by pos type
VERB : 2,596 / 43,171 = 94.0% accuracy
ADJ/ADV : 247 / 3,530 = 93.0% accuracy
NOUN : 2,597 / 72,609 = 96.4% accuracy
Since the AGID only has V,A or N for part-of-speech and Stanza wants the PennTreekbank tag, the code tries all the relevant PTB tags, creates a set of possible answers and considers the result "passed" if the correct answer was in the set. Also note the time/lemma is for passing in the entire set at once. If I call it one at a time it would take all day (literally).
BTW... the numbers here are very close to what I get in Lemminflect. The AGID used here for testing is not necessarily a gold standard and even English experts may disagree on the "correct" answer in some cases. I suspect mid-90s accuracy (aka agreement) is probably as good as it's ever going to get. I'll put a note to this effect in README.
The interesting thing here is really how poorly NLTK and Spacy perform compared to state-of-the-art.
@AngledLuffa I noticed that Stanza 1.5 and CoreNLP 4.5.3 just released and thought I'd re-test them. Stanza 1.5 works fine but CoreNLP 4.5.3 does not include the new morphology class. That's still only in the dev
branch (just FYI in case this is an oversight).
Also note that the Stanza morphology code does not access java if you only set CORENLP_HOME
as per the instructions on the main page. For it even to attempt to make the java call CLASSPATH
must be set. It looks to me like this is due to a check in the new python morphology code. I can file a bug report for this if you want.
Whoops, thanks for catching that. Our distribution packaging script starts from specific main programs rather than including all of our repo, and I forgot to add that base program. I will do so now and make a new CoreNLP 4.5.4. I made a couple changes to Ssurgeon based on feedback from presenting it to people over the weekend (with more to come).
Also, the dev branch of stanza now doesn't need $CLASSPATH
, since CoreNLP 4.5.4 should have it correctly included
https://github.com/stanfordnlp/stanza/commit/4dda14bd585893044708c70e30c1c3efec509863
OK. Looks like things are working as they should and I've updated the Readme to reflect CoreNLP 4.5.4. I assume this closes the issue but feel to add more comments if there are additional issues here.
Awesome, thanks for the update!
I'd be interested to see what we're getting wrong, especially for adj & adv. It's deterministic, so there may very well be a class of words we missed.
I'm not sure how you used the lemma annotator for CoreNLP to test the lemmatizer, but the Morphology class definitely does use POS tags if available:
https://github.com/stanfordnlp/CoreNLP/blob/main/src/edu/stanford/nlp/process/Morphology.java
For example, the
WordTag stemStatic(String word, String tag)
interfaceFWIW, the next version of CoreNLP will cover ADJ & ADV as well