cwrc / NERVE

Named Entity Recognition Vetting Environment
GNU General Public License v2.0
3 stars 3 forks source link

Repository moved to Gitlab https://gitlab.com/calincs/conversion/NSSI in March 2020 and renamed to NSSI.


NERVE

Named Entity Recognition Vetting Environment

This is a web service that allows you to upload an XML document, run Stanford NER to recognize entities, and to look up and add URIs to new or pre-existing entities. The current version supports three schemas: TEI (Text Encoding Initiative); Orlando (the Orlando Project's biography and writing schemas) and CWRC (Canadian Writing Research Collaboratory).

License and documentation forthcoming soon!

Building .jar from source.

These instructions will cover how to download and build the NERScriber .jar file from source. The .jar file is not the web service but rather contains the bulk of the logic for the web service.

prerequisites: maven, java
Note: paths are system dependent.

1. Checkout repository

git clone git@github.com:cwrc/NERVE.git (with key)
git clone git://github.com/cwrc/NERVE.git (without key)
cd NERVE/NERScriber

2. Build the project.

There is a setup script found in /NERScriber which will compile the .jar file and copy relevant files to a directory of your choice.

./NERScriber/setup.sh ./test

mvn package

3. Run the program

cd test

java -jar NERScriber.jar

You should see the following:

usage: nerscriber [-c config_file] [-x context_file] [--ner] [--link] input_file

Options:
-c              specify the configuration file (default: ./config.properties)
-x              specify the context file, (default: auto-detect from 'context.path' in config)
--ner           perform NER tagging
--link          perform link fill in

To run the program on a file you will need to specify the file location, and provide a configuration file. Note you need to specify either NER or LINK or both (order does not matter), otherwise no action will be taken. The output will go to stdout.

Building the Apache Tomcat Web App (outside Docker as of 2019-11-14)

Building the Apache Tomcat Web App (within a Docker Compose environment)

The basic usage to build a test environment

Testing API via curl

Input: JSON with Content-Type: application/json

curl -i -X POST -H "Content-Type: application/json" \
  -d @./test_documents/nerve_test_cwrc_tei_lite.json  http://localhost:6642/ner

Input: XML with Content-Type: text/xml

curl -i --verbose -X GET -H "Content-Type: text/xml" \
  -d @./test_documents/orlando_biography_template.xml http://localhost:6642/ner

Input: JSON from CWRC-Writer in 2019 see issue #85

curl -i -X POST -H "Content-Type: application/x-www-form-urlencoded" \
  -d @./test_documents/nerve_test_cwrc_tei_lite.json  http://localhost:6642/ner

Input: Custom context file (default attributes (e.g., resp or type) and same element name test (e.g., rs with type attribute to record entities)

curl -i -X POST -H "Content-Type: application/x-www-form-urlencoded" \
  -d @./test_documents/nerve_test_cwrc_tei_lite_custom_context_rs.json  http://localhost:6642/ner

More details in the wiki API section