There exist two implementations of
flowd
:
Wire up components (programs) written in different programming languages, using the best features and available libraries of each.
Make them communicate in a network of components.
Build a data factory in which components transform the passed data frames to produce a useful output.
Components naturally make use of all available processor cores.
A component network can span multiple machines, lending itself for use in distributed systems. Routing is available and a load-balancing component exists.
Use available off-the-shelf components where you can. Grow a collection of specialized components and reuse them for the next and next project of yours.
Thus, rather than rewriting code anew for each project, you become more and more efficient with regards to human time spent on development.
This is the basic idea of Flow-based Programming (FBP), as pioneered by J. Paul Morrison.
The flowd
(for flow daemon) is a runtime environment for the execution of FBP processing networks, to be defined by a programmer, which then constitutes an application or processing system of some kind.
The act of programming is thus shifted from entering strings and lines of tailor-made program source code to a more graphical and visual kind of programming with direct feedback of the changes just made, based on the combination and connection of re-usable black boxes working together in a visually drawable and mappable processing network resp. application.
Such an FBP processing network is not limited to linear pipes, ETL steps, a directed acyclic graph (DAG) structure, RPC request-response, client-server, publish-subscribe etc. Instead, it is a versatile and generic superset allowing processing networks spanning multiple flowd instances and non-FBP processing systems and thus the creation of general processing systems and even interactive applications.
You can find out more about this paradigm on J. Paul Morrison's website.
More, humans are terrible at writing, maintaining and understanding code, refer a talk about this. The solution proposed is not to fundamentally improve the way software is engineered, but to keep using conventional programming and just add another layer on top, namely to use AI to generate ever more piles of non-reusable custom application code. Unmentioned in the talk: For understanding and navigating it, one will need even more AI. The alternative, which FBP offers, is to go the other direction and keep applications on a humanly-understandable level by using these re-usable black boxes, which are individually all easily understandable, and connect them to compose software. FBP processing networks are humanly understandable also because they fit the steps, which a design team would use to break down the application's functionality, processing steps and data flows.
TODO Download
TODO Compile and install flowd
and all example components
Run it with:
cargo run
Next, open the online editor. This loads the management application from a central server, but connects to your local runtime.
You should see a predefined test network, can re-arrange the components, start/stop the network etc. You will see output on your terminal.
It should look roughly like this:
For how to use the online editor, see the manual of noflo-ui.
Running in a micro-VM using Unikraft:
kraft run -M 256M -p 3569:3569
Note for MacOS users: Best run the micro-VM via Qemu network backend "vmnet", which was added by the developer of AxleOS.
TODO bots for example
TODO rewrite for flowd-rs and/or move to "developing applications"
Several example components and example processing networks are included.
Compile the network orchestrator and runtime flowd
, then run examples like this:
bin/flowd src/github.com/ERnsTL/flowd/examples/chat-server.fbp
This particular example comprises a small chat or console server over TCP. Upon starting flowd
, it should show that all components have started up and that the TCP server component is ready for connections.
Then connect to it using, for example:
nc -v localhost 4000
When you connect, you should see a message that it has accepted your connection. When you send data, you should see it sent to an intermediary copy component and further back to tcp-server
's response port and back out via TCP to your client.
The flag -quiet
removes the frame passing information, in case you do not want to see it.
The data flow is as follows:
TCP in -> tcp-server OUT port -> chat IN port -> chat server logic -> chat OUT port -> tcp-server IN port (responses) -> TCP out
Also, an initial information packet (IIP) is sent to the ARGS
input port of the tcp-server
component, as defined in the network specification:
'localhost:4000' -> ARGS tcp-server
This is the first packet/frame sent to this component. It usually contains configuration information and is used to parametrize this component's behavior. When sending an IIP to the ARGS port, this is converted to program arguments.
A more complete, parser-exercising example is located in examples/example.fbp
.
You can find out more about the .fbp
network description grammar here:
flowd
currently re-usesTODO rewrite for flowd-rs
flowd can export the network graph structure into GraphViz format for visualization.
The following commands will export a network to STDOUT, convert it to a PNG raster image, view it and clean up:
bin/flowd -graph src/github.com/ERnsTL/flowd/examples/example.fbp | dot -O -Kdot -Tpng && eog noname.gv.png ; rm noname.gv.png
This is alpha software. It works, is quite optimized, but not all of the planned features are currently present and it is not ready for business operations having continuity requiments met. The API may change unexpectedly.
FBP network protocol:
Test suite of the FBP network protocol:
Graph support:
Component management:
Online editing:
Security:
Multi-language feature:
Multiple component APIs, component data formats:
Online network changes:
Component library:
Debugging, tracing:
Logging:
Component repository from local files:
Component hub/repository in the internet:
Deployment and reproducible setups:
Signaling, Monitoring:
Maintenance, Operations:
Testing:
Persistence:
Checkpointing:
Present in Go version to reach feature parity:
Everything else:
TODO connection, disconnection and reconnection (0.4 milestone):
TODO objects (0.5 milestone):
TODO interaction components (0.7 milestone):
TODO merge following into flowd-rs section on this
Currently present features:
.fbp
network specifications.drw
network specifications made using DrawFBPflowd
as the orchestratorThe included example components cover:
Planned features:
Check the milestones on Github.
Basically, implement most functionality using in-memory data structures, then break down the structure into different parts (network backends, component API) and allow the component API to be fulfilled by components from shared objects, scripts etc.
Then add more components, port the Go components or add a wrapper for running them (running components as an external process using STDIN and STDOUT makes sense and will be one of the supported execution models).
Create first applications using these and add features to support these use-cases and evolve in tandem with these.
Finally, become production-ready with management, roles, ACLs, security, hardening overall, monitoring.
TODO
TODO rewrite and integrate
All components are either normal programs or scripts, which do not have to be specially modified to be used in a flowd
network (wrapped in a cmd
component) or they are programs, which understand the flowd
framing format. Raw binary data streams are also possible, eg. a compressed data stream.
All components are each started by the flowd
program. It parses the network definition, defines the network connections, starts the components with arguments and handles network shutdown. It is also possibe to start a network perfectly fine using a hand-written shell script, but having a declarative network definition and let flowd
manage it is easier.
A component can have multiple input and output ports. Ports are named. Without message framing (wrapped in a cmd
component), input can be passed to an unmodified program and output can be used within the processing framework.
A component communicates with the outside world using named pipes which are handled using standard file operations. Over these, it receives input frames and can send frames to its named ports and thus to other components; the frames are sent directly to the other end of the named pipe, which is the input port of the neighbor component.
The framing format is a simple text-based format very similar to an HTTP/1.x header or a MIME message, which is also used for e-mail. Currently, a subset of STOMP v1.2 is used. It can easily be implemented in any programming language, is easy to extend and can carry a frame body in any currently-trendy format be it textual or binary. A frame contains information on (many fields are optional):
For more information on the framing format see the Go implementation source code and the prose format spec.
Using several components, a network can be built. It is like a graph of components or like workers in a data factory doing one step in the processing. The application developer connects the output ports to other components' input ports and parameterizes the components. Most of the components will be off-the-shelf ones, though usually a few have to be written for the specific application project. In this fashion, the application is built.
TODO rewrite for flowd-rs
There exist several FBP runtimes, which emphasize different aspects of FBP and realize the underlying concept in different ways.
There are a few categories of FBP runtimes:
Single-language systems and libraries. Everything is running inside the same process and is written in the same programming language. Often, the network is defined using this programming language as well. This class has the best performance but is also very specialized.
Tighly-integrated systems. These try to pull all components into the same process using dynamic loading of libraries (.so / .dll) and thus into the same address space to save on context switches. To communicate, shared memory is usually used. It is possible to integrate components written in different programming languages, but requires strict conformance and conversion to a common binary message layout and flow of program execution (ABI, application binary interface). Definition of FBP processing networks is done declaratively or using own scripting languages.
Loosely-coupled inter-language systems. The different components run as separate processes and communicate using sockets, named pipes, message queueing systems etc. This category requires little effort, tailoring and no special libraries to get started. They can integrate components and even existing non-FBP-aware programs into its processing networks. The chosen data formats, protocols etc. are based on common, widespread formats which are easy to implement. Definition of networks is usually done in a declarative format.
The different flowd
implementations have different approaches and focuses.
TODO rewrite for flowd-rs
As currently implemented, flowd-go
positions itself on the most performant border of the third category, without requiring conformance and internal data conversion to an ABI. Named pipes are the fastest IPC mechanism behind shared memory. The framing format used is easy to implement and parse with modest processing overhead, assumed to be in the range of the data conversion overhead required by the second category for ABI conformance.
What cannot be removed in this third class is the overhead of data copying across process borders which requires context switches and CPU ring switches. On the other hand, the process-level separation buys the capability to be pretty much universal in terms of integration with other FBP runtimes, free choice of programming languages for components as well as re-use of existing non-FBP programs. (And not to worry, flowd
can still transfer several million IPs / messages per second on laptop-class hardware.)
In other words, flowd-go
puts focus on:
Programming language independence. Being able to easily mix different PLs in one FBP processing network. This enables to combine the strengths of different programming languages and systems into a FBP processing network. To this end, flowd
uses a simple way to communicate, which is common to all programming languages and that is reading and writing to/from files. If it were done in a more complicated way, then it would become neccessary to create a libflowd or similar for interfacing with flowd
and to write bindings to that, for each programming language. But not all PLs can import even a C library, let alone a library written in some other calling convention, because it does not fit their way of computing or internal representations, abstractions etc. and you cannot expect every PL being able to import a library/package written in every other PL. And further, all the bindings to some libflowd would have to be maintained as well. So, since this path is not desireable, communication using named pipes which are just files was chosen, with STDERR being used for any status output, log messages etc. STDIN and STDOUT can be used for terminal UI components. Also, if any complex data format were mandated (Protobuf, XML, JSON, ZeroMQ, MsgPack, etc.) then this would lock out languages where it is not available. Since all this is not desireable nor feasible, flowd
cannot introduce any new protocols, any new data formats or require importing any of their libraries or bindings to these. Therefore a very simple, text-and-line-based and even optional framing format very similar to HTTP/1.x headers and MIME e-mail headers is used, since strings and newlines are available in every programming language and will most likely be available in times to come - well, unless the world decides not to use character strings any more ;-)
Re-use existing programs. Every program, even a Unix pipe-based processing chain, which can output results either to a file or STDOUT, can be wrapped and re-used by flowd
in an FBP processing network. Therefore, flowd
can be used to extend the Unix pipe processing paradigm to a superset, a directed graph model.
Easy to write components. Open the input named pipe file, read lines until you hit an empty line, parse the header fields, read the frame/IP body accordingly. Do some processing, write out a few lines of text = header lines, write out the body to the output named pipe. If the component has anything to report, write it to STDERR. No library to import, no complex data formats, no APIs.
Spreads across multiple cores. The FBP networks of flowd
- like those of other FBP runtimes and systems - intrinsically spread out to multiple CPUs resp. CPU cores. In the case of other systems it is because they are different threads, in flowd
because they are seperate processes. This enables the saturation of all CPU cores and parallel processing to ensue in an easy way - simply by constructing a network of components, which all just read and write to/from files.
Can spread across multiple machines. It is simple to plug in a network transport component like TCP, TLS, KCP, SSH etc. or even pipe your frames into any external program (using the cmd
component). Either the frame body = data content or the frames themselves can be sent, thus creating a bridge to another part of the FBP processing network. This enables the creation of distributed systems. So, the FBP network concept can spread to and harness the computing power of multiple machines. Using the load-balancer
component, a front-end can forward requests to one of multiple back-end processing networks with the ability to take them offline individually for maintenance or updates.
The downsides of the approach taken by flowd
:
If you rather want to do FBP in Go, but prefer an in-process-communicating runtime/library for a single machine, then you might be interested in goflow or flowbase. Also check out the FBP runtimes and systems by J. Paul Morrison and NoFlo and their compatible runtimes.
TODO rewrite with flowd-rs
One feature of FBP is the ability to freely transform data. Thus as a general solution, common IPC mechanisms like TCP, WebSocket or Unix domain sockets can be used to bridge FBP networks running in different FBP runtimes. flowd can also start other runtimes as subprocesses using the cmd
component.
For more optimal and tighter integration, there are gateway components and protocols as follows:
TODO
TODO add criterion perf tracking
TODO rewrite section for flowd-rs
Three stages usually:
TODO difference is that this goes beyond ETL. It also goes beyond the DAGs, which seem fashionable these days.
TODO modeling the application in terms of what data is relevant and what structure it has, where the data comes from, how it should be transformed and which results should be produced (see JPM book).
TODO no conceptual dissonance between design and implementation stages.
TODO straight implementation, almost waterfall-like, fewer refactorings.
TODO Linear maintenance cost in relation to program size.
TODO rewrite for flowd-rs.
Decide if your program shall implement the flowd
framing format or be wrapped in a cmd
component.
If wrapped, you can decide for the program to be called for each incoming frame in order to process it or if your program should process a stream of frame bodies. If one-off, the program will receive data on STDIN, which will be closed after the frame body has been delivered; the program can the output a result, which will be forwarded into the FBP network for further processing. It is then expected to either close STDOUT or exit the program. In the one-instance mode, STDIN and STDOUT will remain open; your program will receive data from incoming frame bodies to be processed and any output will be framed by the cmd
component and again forwarded into the network.
Otherwise, implement the simple flowd
framing format, which can be seen in the files libflowd/framing.go and libflowd/framing_test.go. It is basically STOMP v1.2 as specified with the modifications mentioned there. This can be done using a small library for the programming language of your choice. Your component is expected to open the named pipes given and will then be connected with the neighbor components. Frames of type data and control are common. Especially important are the IIPs, denoted by their body type IIP, which are usually used for component configuration. Port closing detection is done using regular EOF on the named pipe; this is usually the signal that all data has arrived from the preceding component and that it shut down; it can also be re-opened if that is the use-case. Components should forward existing headers from the incoming frames/IPs, because downstream connections might lead to a loop back to the sender requiring a header field present for correlation, like for example a TCP connection ID, so keep additional header fields intact; packet tracing is also implemented using marker values in the header. Output frames, if any, are then to be sent to the output named pipes. That way, the frames from your component are sent directly to the component which is connected to the other side of the given output port - to be processed, filtered, sorted, stored, transformed and sent out as results to who knows where... That's it - it's up to you!
flowd
TODO rewrite for flowd-rs
Running tests:
GOPATH=`pwd` go test ./src/github.com/ERnsTL/flowd/...
Running benchmarks:
GOPATH=`pwd` go test -run=BENCHMARKSONLY -bench=. ./src/github.com/ERnsTL/flowd/libflowd/
Running tests for the JSON FBP network protocol: Follow the basic instructions (TODO which? what? where?), but initialize with the following
fbp-init --name flowd --port 3000 --command "bin/flowd -olc localhost:3000 src/github.com/ERnsTL/flowd/examples/chat-server.fbp" --collection tests
Use the latest node.js
and npm
from nodesource, otherwise you may get Websocket errors. The npm package wscat is useful for connection testing.
To make a Flow-Based Programming (FBP) runtime suitable for production operation and reliable applications, it should possess several key characteristics:
Reliability and Robustness: The FBP runtime should be stable and reliable to ensure uninterrupted execution of critical applications. It should be able to handle errors robustly to avoid or at least minimize failures.
Scalability: The runtime environment should be capable of handling growing demands and workloads to ensure efficient execution of applications. This may involve scaling vertically (on larger machines) or horizontally (by adding more instances).
Monitoring and Debugging: There should be mechanisms for monitoring and troubleshooting to analyze performance, identify bottlenecks, and debug issues. This can be achieved through logging, dashboards, tracing, and other tools.
Security: The runtime environment should provide security mechanisms to ensure data integrity, confidentiality, and availability. This may include authentication, authorization, encryption, and protection against attacks such as injection attacks and denial-of-service attacks.
Transaction Support: It is important that the FBP runtime supports transactions to ensure data consistency and meet the Atomicity, Consistency, Isolation, and Durability (ACID) properties.
Integration: The runtime environment should seamlessly integrate with other systems and services to support data flows between different applications, platforms, and external services. This can be done through APIs, protocols such as HTTP, messaging systems, and other mechanisms.
Documentation and Support: Comprehensive documentation and supportive community can help increase developer productivity and efficiently solve issues. A good FBP runtime should have clear documentation, tutorials, examples, and support channels.
By fulfilling these characteristics, an FBP runtime environment can be made suitable for production and reliable applications to support stable, scalable, and reliable workflows.
GNU LGPLv3+