arr-ai / arrai

The ultimate data engine.
http://arr.ai
Apache License 2.0
20 stars 15 forks source link

Plugin architecture #531

Open marcelocantos opened 4 years ago

marcelocantos commented 4 years ago

Please do not post any internal, closed source snippets on this public issue tracker!

Purpose

Please describe the end goal you are trying to achieve that has led you to request this feature.

An important form of interaction with the world is invoking external programs and communicating with them over stdin, stdout, network protocols, etc. It would be useful to do this from arr.ai, but we want some degree of control over how it's implemented so as to avoid unfettered statefulness.

Suggested approaches

What have you tried, and how might this problem be solved?

Rather than provide an unfettered process running model, a model similar to Erlang ports might be in order. Rather than Erlang's messaging approach, arr.ai might expose a plugin as a function that takes an arr.ai value and returns an arr.ai value.

Functions are called via a protobuf message (#597) delivered over stdin with a simple framing protocol. Results are returned over stdout via a similar protocol, which includes the ability to deliver results out of order.

Plugins will be configured via the externals capability (#528) and exposed via a special import syntax. Current thinking is //<myplugin> (mnemonic: < and > are redirection operators in most shells). Passing command-line parameters would not be supported. Parameterisation should be via stdin. Each request would be expected to contain all the necessary context to perform the work, rather than having one request set things up for subsequent requests.

In future, to discourage developers from baking in implied state, the framework might intentionally randomise delivery of requests (but ensure that the max latency is no more than a few ms and amortised latency is indistinguishable from immediately delivery). This assumes lazy semantics and parallel execution, neither of which is in play in the current implementation.

To support the canonical Unix pipes and filters model, it should also be possible to define a plugin that interacts with the subprocess by passing and returning byte arrays, which are delivered over stdin/stdout. There should also be a way to pass dynamic command line parameters to such plugins since we can't assume an agreed protocol for configuring over stdin. Supporting bidirectional streaming will be difficult, so MVP could be that each call instantiates the plugin with the supplied cmdline args, passes the supplied array to stdin in one hit, consumes stdout in a background thread and returns the output stream as a lazy byte array.

Post-MVP implement the following redux model.

//<plugin>(
    initState,
    \(:state, :event) ->
        (
            # If not present, don't output anything.
            send: next-bytes-to-send-expr,

            # If not present, retain old state.
            state: new-state-expr,

            # If present, return only when at least int-expr bytes have been received (or EOF).
            # Otherwise, get the next available bytes, only waiting for at least one byte to arrive.
            wait: int-expr,

            # If present, append array-expr to the lazy array returned by the plugin invocation.
            append: array-expr,

            # If present and true, return immediately.
            done: true-or-false,
        ),
    )

The callback should be called immediately after the plugin is instantiated with an "init" event. This offers a way to specify initial send and wait values.

The redux approach might not be as hard as it sounds. Consider doing it for MVP and supporting the intended MVP model as a special case.

We should also consider allowing Go plugins as arr.ai plugins.