NewsReader is a natural language processing pipeline. Among others, it tags parts-of-speech, recognizes named entities and annotates entities with predicates.
There are a number of implementations of the NewsReader pipeline:
At the moment, none of these implementations succesfully build the whole pipeline for Dutch (see issues tracker). We have therefore decided to build the pipeline from individual modules.
We have imported all modules from NewsReader under the heading "Dutch modules":
These modules depend on the following software packages:
The goal is to construct a lightweight, portable pipeline, which we achieve through a Docker image. This image is available from Docker Hub and can be obtained by pulling:
docker pull evidence/newsreaderdutch
If you would like to make change and build the image yourself, call:
docker image build -t newsreaderdutch NewsReaderDutch/
from within the root of the repository.
The Docker container can be run directly on your text files by calling:
docker run -v /workspace/:/work/ newsreaderdutch /work/file.txt
where /workspace/
is your local directory containing files that need to be processed and file.txt
is the document that you would like to get annotated. The output will have the same filename, but with a *.naf
extension. Currently, the pipeline writes the output of each module separately as well.
Questions, comments and bugs can be submitted to the issues tracker.