Workflow Engine to Democratise Capabilities

Summary

Long read below so:

The workflow engine described exists, is open source and ready now
Hosting the control plane can be done from a single node kubernetes cluster (k3d, kind or minikube is fine) or use a commercial control plane hosting provider
All that's required here is to start modelling workflows

Background

Ability to provide access to toolings and workflows to as many people as possible. These tools and capabilities must be "consumable" as easily as possible to ensure widespread use and adoption.

Problem Statements

Installing and using tools often requires advanced technical knowledge and thus is not suitable for widespread adoption
Tooling and "providers" (the how) can change often whereas the processes (or aims) rarely change.

Example of 1: Ability to use the instagram-location-search capability first requires that you know what Python is. Second, that you can install it, third that you are savvy enough to open dev tools and find your session ID. All of this adds an unneccessarily high barrier to entry.

Example of 2: Perform a phone number lookup (the aim or "what") will remain fairly static over time (the tooling or "how") is subject to potentially frequent change. If, tomorrow, I want to lookup a number not from instagram but from another provider, I need to write an entirely new program.

Solution

Provision a workflow engine in front of these capabilities with an easy to understand, standardised and open source eventing mechanism.
Provide the concept of sequences which can consist of one or more tasks. Optionally provide the ability to link sequences together in larger workflows.
Provide a standardised "control plane" that a user (or app) can interact with to trigger sequences and workflows.
Provide flexibility in where and how the "tooling" runs (alongside the control plane or remotely, in containers or "standard" processes).
Ablility to write tooling in any language - the consumer shouldn't know or care about implementation
Provide flexibility to adjust tooling without altering overall capability (eg. swap Instagram for Tool X without impacting a user)

Simple Sequence (Single Task)

Using the above example of a phone number lookup, we could model it like this (stages modelling the available environments - in this case, the development environment):

spec:
  stages:
    - name: dev
      sequences:
        - name: sequence-one
          tasks:
          - name: checknumber

If someone wants to check a phone number:

They HTTP POST a dev.sequence-one.triggered event
The workflow engine knows that it must now trigger a task called checknumber so the workflow engine triggers a checknumber.triggered event
The "Phone number checker" service listens for checknumber.triggered events and knows that it must start working
The "Phone number checker" service informs the engine that it has started by sending a checknumber.started event
Optionally during execution, the "phone number checker" can send updates with the checknumber.status.changed event
The "Phone number checker" has interacted with the Instagram APIs and done it's work. It sends a checknumber.finished event along with any result back as an event.
The workflow engine knows that the checknumber task is done and that it is the only task in the sequence, therefore the sequence is done.
The final result is returned to the user (or web app)

Complex Sequence (Multiple Tasks)

Take a more complex sequence which consists of multiple tasks. This one may model a process to judge a photo's authenticity:

spec:
  stages:
    - name: dev
      sequences:
        - name: sequence-two
          tasks:
          - name: extract-metadata
          - name: geolookup
          - name: author-search
          - name: evaluate

In this sequence, the engine won't start the geolookup task before it receives a .finished event from the extract-metadata task. The final task assumes that we have some metrics (perhaps from the previous steps) which we can use to judge the authenticity of an image. The evaluate task takes, as input, a set of metrics, their corresponding thresholds and ultimately produces a decision pass, warning or fail and a score which indicates the authenticity.

This evaluate logic is already available out-of-the-box with the engine so no additional implementation is required here. Just tell the engine:

What are the metrics?
Where are the metrics to be retrieved from?
What are the thresholds?

The engine will do the evaluation and produce the report.

As I say, the engine and all eventing logic is ready. Reach out if you want to trial some sequence and task definitions.

bellingcat / open-questions