dkpro-bigdata

DKPro BigData enables the easy execution of UIMA-based natural language processing pipelines on a hadoop cluster.

Features

Large scale NLP processing using UIMA and hadoop Store your corpora on a Hadoop filesystem and access them from local or distributed pipelines Find patterns in your textual data using adaptable collocation extraction

Details

Execute DKPro pipelines on a hadoop cluster with minimal adaption
Read data stored on a HDFS Filesystem using DKPro Collection Readers
Read/Write serialized CASes from HDFS

Contributors:
Hans-Peter Zorn
Johannes Simon
Martin Riedl
Richard Eckart de Castilho
Steffen Remus

License

DKPro BigData is licensed under the Apache Software Licence (ASL) Version 2.0.

This project is a joint effort of UKP Lab and the Language Technology Group, Technical University of Darmstadt.

dkpro / dkpro-bigdata

readme

dkpro-bigdata

Features

Details

Contributors:

License