This project collects a number of core libraries for Natural Language Processing (NLP) developed by Cognitive Computation Group.
Depending on what you are after, follow one of the items:
Each library contains detailed readme and instructions on how to use it. In addition the javadoc of the whole project is available here.
Module | Description |
---|---|
nlp-pipeline | Provides an end-to-end NLP processing application that runs a variety of NLP tools on input text. |
core-utilities | Provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc. |
corpusreaders | Provides classes to read documents from corpora into core-utilities data structures. |
curator | Supports use of CogComp NLP Curator, a tool to run NLP applications as services. |
edison | A library for feature extraction from core-utilities data structures. |
lemmatizer | An application that uses WordNet and simple rules to find the root forms of words in plain text. |
tokenizer | An application that identifies sentence and word boundaries in plain text. |
transliteration | An application that transliterates names between different scripts. |
pos | An application that identifies the part of speech (e.g. verb + tense, noun + number) of each word in plain text. |
ner | An application that identifies named entities in plain text according to two different sets of categories. |
md | An application that identifies entity mentions in plain text. |
relation-extraction | An application that identifies entity mentions, then identify relation pairs among the mentions detected. |
quantifier | This tool detects mentions of quantities in the text, as well as normalizes it to a standard form. |
inference | A suite of unified wrappers to a set optimization libraries, as well as some basic approximate solvers. |
depparse | An application that identifies the dependency parse tree of a sentence. |
verbsense | This system addresses the verb sense disambiguation (VSD) problem for English. |
prepsrl | An application that identifies semantic relations expressed by prepositions and develops statistical learning models for predicting the relations. |
commasrl | This software extracts relations that commas participate in. |
similarity | This software compare objects --especially Strings-- and return a score indicating how similar they are. |
temporal-normalizer | A temporal extractor and normalizer. |
dataless-classifier | Classifies text into a user-specified label hierarchy from just the textual label descriptions |
external-annotators | A collection useful external annotators. |
To include one of the modules in your Maven project, add the following snippet with the
#modulename#
and #version
entries replaced with the relevant module name and the
version listed in this project's pom.xml file. Note that you also add to need the
<repository>
element for the CogComp maven repository in the <repositories>
element.
<dependencies>
...
<dependency>
<groupId>edu.illinois.cs.cogcomp</groupId>
<artifactId>#modulename#</artifactId>
<version>#version#</version>
</dependency>
...
</dependencies>
...
<repositories>
<repository>
<id>CogCompSoftware</id>
<name>CogCompSoftware</name>
<url>http://cogcomp.org/m2repo/</url>
</repository>
</repositories>
If you are using the framework, please cite our paper:
@inproceedings{2018_lrec_cogcompnlp,
author = {Daniel Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling, Dan Roth},
title = {CogCompNLP: Your Swiss Army Knife for NLP},
booktitle = {11th Language Resources and Evaluation Conference},
year = {2018},
url = "http://cogcomp.org/papers/2018_lrec_cogcompnlp.pdf",
}