Named Entity Recognition Vetting Environment
This is a web service that allows you to upload an XML document, run Stanford NER to recognize entities, and to look up and add URIs to new or pre-existing entities. The current version supports three schemas: TEI (Text Encoding Initiative); Orlando (the Orlando Project's biography and writing schemas) and CWRC (Canadian Writing Research Collaboratory).
License and documentation forthcoming soon!
These instructions will cover how to download and build the NERScriber .jar file from source. The .jar file is not the web service but rather contains the bulk of the logic for the web service.
prerequisites: maven, java
Note: paths are system dependent.
git clone git@github.com:cwrc/NERVE.git (with key)
git clone git://github.com/cwrc/NERVE.git (without key)
cd NERVE/NERScriber
There is a setup script found in /NERScriber which will compile the .jar file and copy relevant files to a directory of your choice.
./NERScriber/setup.sh ./test
mvn package
cd test
java -jar NERScriber.jar
You should see the following:
usage: nerscriber [-c config_file] [-x context_file] [--ner] [--link] input_file
Options:
-c specify the configuration file (default: ./config.properties)
-x specify the context file, (default: auto-detect from 'context.path' in config)
--ner perform NER tagging
--link perform link fill in
To run the program on a file you will need to specify the file location, and provide a configuration file. Note you need to specify either NER or LINK or both (order does not matter), otherwise no action will be taken. The output will go to stdout.
Dockerfile
for detailsmvn install
will make jar available as a local maven dependencycd Service; mvn package war:war
to create the war file for Apache TomcatThe basic usage to build a test environment
docker-compose build
docker-compose up -d
rebuild
docker-compose build --no-cache --parallel
docker-compose build --force-rm --no-cache --pull --parallel
docker-compose up --build --force-recreate -d
peak inside container instance
docker-compose exec webapp bash
Input: JSON with Content-Type: application/json
curl -i -X POST -H "Content-Type: application/json" \
-d @./test_documents/nerve_test_cwrc_tei_lite.json http://localhost:6642/ner
Input: XML with Content-Type: text/xml
curl -i --verbose -X GET -H "Content-Type: text/xml" \
-d @./test_documents/orlando_biography_template.xml http://localhost:6642/ner
Input: JSON from CWRC-Writer in 2019 see issue #85
curl -i -X POST -H "Content-Type: application/x-www-form-urlencoded" \
-d @./test_documents/nerve_test_cwrc_tei_lite.json http://localhost:6642/ner
Input: Custom context file (default attributes (e.g., resp or type) and same element name test (e.g., rs with type attribute to record entities)
curl -i -X POST -H "Content-Type: application/x-www-form-urlencoded" \
-d @./test_documents/nerve_test_cwrc_tei_lite_custom_context_rs.json http://localhost:6642/ner
More details in the wiki API section