CogStack / CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
https://hub.docker.com/r/cogstacksystems/cogstack-nifi/
Other
36 stars 19 forks source link
apache-nifi data-integration data-pipelines elasticsearch electronic-health-records kibana nifi nlp rest

Introduction

This repository proposes a possible next step for the free-text data processing capabilities implemented as CogStack-Pipeline, shaping the solution more towards Platform-as-a-Service.

CogStack-NiFi contains example recipes using Apache NiFi as the key data workflow engine with a set of services for documents processing with NLP. Each component implementing key functionality, such as Text Extraction or Natural Language Processing, runs as a service where the data routing between the components and data source/sink is handled by Apache NiFi. Moreover, NLP services are expected to implement an uniform RESTful API to enable easy plugging-in into existing document processing pipelines, making it possible to use any NLP application in the stack.

Important

Please note that the project is under constant improvement, brining new features or services that might impact current deployments, please be aware as this might affect you, the user, when making upgrades, so be sure to check the release notes and the documentation beforehand.

Asking questions

Feel free to ask questions on the github issue tracker or on our discourse website which is frequently used by our development team!

Project organisation

The project is organised in the following directories:

Documentation and getting started

Knowledge requirements: Docker usage (mandatory), Python, Linux/UNIX understarting.

Official documentation now available here.

As a good starting point, deployment walks through an example deployment with some workflow examples.

All issues are tracked in README, check that section before opening a bug report ticket.

Important news and updates

Please check IMPORTANT_NEWS for any major changes that might affect your deployment and security problems that have been discovered.