emorynlp / nlp4j-old

NLP tools developed by Emory University.
Other
60 stars 19 forks source link

Typo in example in Data Format? #1

Closed justhalf closed 8 years ago

justhalf commented 8 years ago

When looking at this file, I noticed that the lemmatized column in the example doesn't match with the word ("founder" lemmatized into "owner" and "EmoryNLP" lemmatized into "emory").

Is this intentional?

Also, I noticed that "'s" in "He's" is lemmatized into "be" with the POS tag "VBZ". I can't reproduce this by using the EnglishMorphAnalyzer in nlp4j-morphology (following this example)

Is the example manually written? Or is there some other process that preprocess the words?

I'm asking this because I believe it's good if the example matches exactly the system behavior.

jdchoi77 commented 8 years ago

The examples were hand-written so it had typos; I replaced them with automatically generated output so please take a look.

At the moment, we do not lemmatize "'s" into "be" because we thought it could be either "be" or "have"; however, our current lemmatizer can make use of the context (it didn't before) so we should be able to handle this part. We'll include this feature for our next version.

https://github.com/emorynlp/nlp4j-morphology/issues/7

Thanks!

best,

Jinho