SAP / neonbee

A reactive dataflow engine, a data stream processing framework using Vert.x
https://neonbee.io
Eclipse Public License 2.0
39 stars 17 forks source link

🐝 NeonBee Core

Discord REUSE status Coverage

NeonBee is an open source reactive dataflow engine, a data stream processing framework using Vert.x.

Description

NeonBee abstracts most of Vert.x's low-level functionality by adding an application layer for modeling a dataflow, data stream processing and for doing data consolidation, exposing everything via standardized RESTfull APIs in form of interfaces called endpoints.

Additionally NeonBee takes care of all the execution and comes bundled with a full-blown application server with all the core capabilities of Vert.x at hand but without you having to care about things like a boot sequence, command line parsing, configuration, monitoring, clustering, deployment, scaling, logging / log handling and much more. We put a rich convenience layer for data handling on top, simplifying technical data exchange.

To achieve this simplification, NeonBee reduces the scope of Vert.x by choosing and picking the most appropriate core and extensions components of Vert.x and providing them in a pre-configured / application server like fashion. For example, NeonBee comes with a default configuration of the SLF4J logging facade and Logback logging backend, so you won't have to deal with choosing and configuring the logging environment in the first place. However, in case you decide to go with the logging setup NeonBee provides, you are still free to change.

NeonBee vs. Vert.x

After reading the introduction, you might ask yourself when do I need NeonBee or when to choose NeonBee over Vert.x? Summarizing a few of our main advantages, that will let you help choose. If you say "no, I do that myself anyways", you might stay with Vert.x for now!

Core Components

To facilitate the true nature of NeonBee, it features a certain set of core components, abstracting and thus simplifying the naturally broad spectrum of the underlying Vert.x framework components:

Dataflow Processing

While you may just use the NeonBee core components to consume functionalities of Vert.x more easily, the main focus of NeonBee lies on data processing via its stream design. Thus, NeonBee adds a sophisticated high-performance data processing interface you can easily extend plug-and-play. The upcoming sections describe how to use NeonBees data processing capabilities hands-on. In case you would like to understand the concept of NeonBees dataflow processing more in detail, for instance on how different resolution strategies can be utilized, for a highly optimized traversal of the data tree, please have a look at this document explaining the theory behind NeonBees data processing.

Data Verticles / Sources

The main component for data processing is more or less a specialization of the verticle concept of Vert.x. NeonBee introduces a new AbstractVerticle implementation called DataVerticle. These types of verticles implement a very simple data processing interface and communicate between each other using the Vert.x Event Bus. Processing data using the DataVerticle becomes a piece of cake. Data retrieval was split in two phases or tasks:

  1. Require: Each verticle first announces the data it requires from other even multiple DataVerticle for processing the request. NeonBee will, depending on the resolution strategy (see below), attempt to pre-fetch all required data, before invoking the next phase of data processing.
  2. Retrieve: In the retrieval phase, the verticle either processes all the data it requested in the previous require processing phase, or it perform arbitrary actions, such as doing a database calls, or plain data processing, mapping, etc.

Conveniently, the method signatures of DataVerticle are named exactly like that. So, it is very easy to request / consolidate / process data from many different sources in a highly efficient manner.

During either phase, data can be requested from other verticles or arbitrary data sources, however, it is to note that those kinds of requests start spanning a new three, thus they can only again be optimized according to the chosen resolution strategy in their sub-tree. It is best to stick with one require / retrieval phase for one request / data stream, however it could become necessary mixing different strategies to achieve the optimal result, depending on the use case.

Entity Verticles & Data Model

Entity verticles are an even more specific abstraction of the data verticle concept. While data verticles can really deal with any kind of data, for data processing it is mostly more convenient to know how the data is structured. To define the data structure, NeonBee utilizes the OASIS Open Data Protocol (OData) 4.0 standard, which is also internationally certified by ISO/IEC.

OData 4.0 defines the structure of data in so-called models. In NeonBee, models can be easily defined using the Core Data and Services (CDS) Definition Language. CDS provide a human-readable syntax for defining models. These models can then be interpreted by NeonBee to build valid OData 4.0 entities.

For processing entities, NeonBee uses the Apache Olingo™ library. Entity verticles are essentially data verticles dealing with Olingo entities. This provides NeonBee the possibility to expose these entity verticles in a standardized OData endpoint, as well as to perform many optimizations when data is processed.

Data Endpoints

Endpoints are standardized interfaces provided by NeonBee. Endpoints call entry verticles (see dataflow processing) to fetch / write back data. Depending on the type of endpoint, different APIs like REST, OData, etc. will be provided and a different set of verticles gets exposed. Given the default configuration, NeonBee will expose two simplified HTTP endpoints:

Getting Started

This Kickstart Guide is recommended for a first start with NeonBee.

Further NeonBee examples can be found in this repository.

Our Road Ahead

We have ambitious plans for NeonBee and would like to extend it to be able to provide a whole platform for dataflow / data stream processing. Generalizing endpoints, so further interfaces like OpenAPI can be provided or extending the data verticle concept by more optimized / non-deterministic resolution strategies are only two of them. Please have a look at our roadmap for an idea on what we are planning to do next.

Contributing

If you have suggestions how NeonBee could be improved, or want to report a bug, read up on our guidelines for contributing to learn about our submission process, coding rules and more.

We'd love all and any contributions.