Support for ClearNlp 3 - Githubissues

reckart commented 9 years ago

This should probably go into a separate module so we can have the old ClearNLP around
for another while.

Here the release annoucement for ClearNLP 3:

---

Folks,

After a long delay, the ClearNLP version 3 is finally here; in fact, the latest version
is 3.0.2.  Here are the key changes in this version.

ClearNLP is now developed by the Center for Language and Information Research at Emory
University.
Our maven group ID is changed from com.clearnlp to edu.emory.clir.
All our repositories are moved from github.com/clearnlp to github.com/clir.
The version 3.0.0 is written from the scratch. All components in this version show
significant speed-up over the previous ones (2-3 times), and the statistical models
consume less disk and memory space.
Staistical models for general, medical, and bioinformatics domains are provided (see
here; the medical and bioinformatics models will be uploaded by March 25th, 2015).
The tokenizer preserves non-UTF8 characters as they are; previously, they were converted
to their UTF8 equivalent characters (e.g., smart double quotes to ").
The dependency parser is back to greedy parsing, which makes the model size much smaller
(about 18 times less disk space) and much faster (about 10K tokens per second in Intel
Xeon CPU) without sacrifying much accuracy (about .5% lower).
This version does not include the semantic role labeler. There have been many changes
in PropBank and we decided to spend another month for developing a new semantic role
labeler. The semantic role labeler will be ready in May, 2015.
We are preparing a named entity recognizer and a coreference resolution system. These
systems will be ready in August, 2015.
Better documentation is provided at our guidelines project for more details about training,
decoding, javadoc, etc.
ClearNLP is no longer a single person project; the whole NLP research team at Emory
University is working on it and we're very excited about the potential of this project.
 Please give us your feedback so we can make this better.  I'm finally back on track
of developing/improving new/old components in ClearNLP so will be much more prompt.

Thanks and I hope everyone is enjoying the Spring.

best,

Jinho

---

Original issue reported on code.google.com by richard.eckart on 2015-03-24 10:53:27

reckart commented 9 years ago

ClearNLP version 3.1.0 is released.

A new component for named entity recognition is added, which shows state-of-the-art
accuracy on both CoNLL'03 and OntoNotes data (a paper describing our approach is under
submission).
All statistical models are upgraded; the part-of-speech tagger and the dependency parser
use features extracted from distributional semantics, which give more robust results
on unseen data.
The dependency parser is trained on data from our new dependency conversion adapting
many concenpts from the universal dependency structures and introducing some new useful
labels such as "dative".
Components for semantic role labeling and coreference resolution will be added in June.
 Please let me know if you have any question/suggestion on ClearNLP.  Thank you!

best,

Jinho

Original issue reported on code.google.com by richard.eckart on 2015-04-30 08:36:33

Labels added: Module-clearnlp

reckart commented 9 years ago

ClearNLP 3.1.1 is released.

Word embedding lexicons are removed from the global lexica, which didn't add much accuracy
but took so much RAM space.  Furthermore, the gazetteers for named entity recognition
are now separated from the global lexica for better modulation (see models for more
details).
The core dictionary is updated; some past-tense verbs recognized as base verbs are
now fixed.
The named entity recognition model is updated.
See pom.xml for all updated maven dependencies.
We'll be making many more good updates in the summer so please stay in tune.  Thank
you!

best,

Jinho

Original issue reported on code.google.com by richard.eckart on 2015-05-08 13:13:36

reckart commented 8 years ago

Closing as duplicate of #792

dkpro / dkpro-core

Support for ClearNlp 3 #603