feature: generic plugin API to make plugins useable on any server

rob42 commented 5 years ago

Currently the node server implements functionality by adding plugins implemented internally (to node), and the artemis server uses handlers, also internal.

This means we are rewriting the same functionality, and often producing different results. Further writing plugins for node requires js skills, and artemis requires java skills. Other languages are not really practical (although some functionality is possible).

So basically we are creating a standard protocol, but not shared functionality. Given we have limited resources and a huge vision (to take over the world) we should share plugins, which requires a standard plugin API.

This plugin API should have the following features:

be language agnostic (java, js, python, C/C++, ruby, scala, go etc)
provide security/sandboxing. AFAIK every plugin in node has full node permissions, not great when anyone can publish anything onto npm...
have high performance/ low overhead
be simple to implement

This is largely possible now using the Signal K over TCP. Open a connection, subscribe to suitable data, send updates, but its not flexible enough, and requires to much infrastructure, eg a signalk client, complex message formats, and connection management.

I propose a simpler version, based on the handler technique used in artemis, which is proving very easy and flexible. Lets consider an anchor watch plugin.

1) We create a localhost TCP port to which a plugin must connect. On first connection it requests access to various signalk keys. (These are duly allowed/refused by admin). A token is provided for next connection, etc

2) The signalk server streams all matching keys at the leaf level as json

{"key":"vessels.366982330.navigation.position", 
   "value": {
            "longitude": 173.1693,
            "latitude": -41.156426,
            "altitude": 0
          }
}

These values can pass through a security filter, so they only match appropriate keys.

3) The anchor watch plugin collects lat/lons, and calculates current radius against the maxRadius.

4) It sends back updates as json

{
  "key":"vessels.366982330.navigation.anchor.currentRadius",
  "value": 55.34
}

The updates are also passed through the security filter, so the plugin cannot alter random data.

This results in a very simple implementation, and high performance. Its compatible with event based execution, async messaging/queuing, and persistent storage. And it can be written in any language.

It allows for plugins to be started in a single context (eg a java VM or node instance) or externally as standalone apps. It works via localhost on the same server, and via the network, allowing heavy duty analytics to work on separate hardware.

Since artemis uses message queues and transparent paging to disk, its also ideal for intermittent connections like cloud servers. When the plugin reconnects, it simply continues sending the queue.

rob42 commented 5 years ago

Some additional notes on artemis handlers:

All incoming signalk messages are split down to key:value and enter the kv queue.
The entry is intercepted by security and allowed/dropped as required.
The entry is queued for all handlers with a filter that matches the key
The handler reads and takes action entering any result back into the kv queue, where the process repeats.

Because messages are queued, (and persistent) its very resilient to slow/intermittent consumers, and very easy to run multiple copies of the same handler, to provide extra performance when its needed.

sbender9 commented 5 years ago

Isn’t this all already possible with what we have in the spec today? A “plugin” is not really different from any other device/client on the network. We have already defined the security protocols. What’s missing is how to discover and load/install/run them.

rob42 commented 5 years ago

Yes, as i said above you can use the std apis and they will be appropriate for some applications. But they require quite a lot of work to create, and are quite cpu intensive to service, especially at high message rates. With this api the plugin is very simple, and lives deep in the event stream. It makes it easy to write plugins for micro tasks like true wind from apparent, anchor alarm check, raising notifications, in just a few lines of code.

rob42 commented 5 years ago

BTW the find/load/run can be done the same way as webapps. See #542

tkurki commented 5 years ago

With a bit of experience with plugins I have noticed there is one very common use case: process some paths for self or all contexts. This is essentially a subscription, but the delta we now have is not very convenient message format for this. To this end the current N server plugin api has getSelfBus(path) and getBus(path) that return a stream of objects with structure

{
    path: ...,
    value: ...,
    context: ...,
    source: ...,
    $source: ...,
    timestamp: ...
  }

that internally is called normalized delta, but I guess the proper name for it would be denormalized delta. This simplifies the client code, as it does not need to traverse the updates-values hierarchy: it knows what the data will contain.

In the self case context is not really needed, but easier to deal with just one format.

I believe the format must have timestamp, to allow for non real time handling handling ("intermittent and slow consumers" above). Source must also be there - we have previously made the mistake of adding multiple sources as an afterthought. It should be baked in early as a first class concept.

So far the context + path structure has served us well and breaking from it should be carefully thought, as then the logical structure would be different and paths would be different, depending on where the data came from.

But is this really necessary? The goals you have stated can be reached with the existing delta streaming and subscriptions. some paths for self or all contexts sounds awfully much like subscriptions to me.

be language agnostic, provide security/sandboxing

I don't see any other solution than running the plugins in a separate process, with lesser privileges, and communicating with it via IPC or network protocols, the latter being the obvious choice. This you can achieve with today with delta streaming (from & to server), authentication and access controls (read and write side). A new protocol would have to deal with the same things that are already implemented: how do you express what paths you want (subscriptions), what rights the plugin has (authentication and access control.

AFAIK every plugin in node has full node permissions, not great when anyone can publish anything onto npm...

I do not disagree with the threat model here, just don't see how this would be any different with other programming languages.

have high performance/ low overhead

Discounting how Artemis works internally how is what you propose inherently more performant than delta streaming with subscriptions?

be simple to implement

Being able to subscribe to exactly what you want has advantages on the plugin/client side. Then again I don't see how your proposal is significantly simpler to implement - you still need connection and authentication management, if you are operating over a net connection and not ipc.

One way to make things simpler would be to communicate just with stdin/out/err. In fact I recently implemented a wrapper for implementing plugins in Python. The forks a child process and fires up the Python code. The Python child outputs deltas in stdout that the server reads. This would be easy enough to treat like a delta streaming connection with subscription support. You could drop the privilileges of the child process to achieve security and as the server knows the "identity" of the child it can apply security constraints on the child's subscriptions and output as it sees fit, with no authentication related code needed on the plugin/client side.

I think this kind of a forked plugin model would allow language agnostic, server managed (find-install-activate) plugins.

One important use case for plugins is intercepting incoming data before the server processes it. This allows blocking and altering the data conditionally. The difference between derived data, like the anchor alarm example, and interception is that interception needs to happen synchronously / pre-emptively. This would mean that all data is passed to the plugin and not processed by the server, unless output back to the server by the plugin. Node server's plugin API has this functionality.

Side note: Node-RED is gaining popularity as a swiss army knife for SK processing. I think it serves as an interesting example here considering what a plugin is: you can use a separate Node-RED process and connect it with SK server via ws, but if you want to do interception you need to use the plugin version, that allows you to use the registerDeltaInputHandler functionality in the plugin API.

To summarise:

Node server has some thing already implemented in this direction, so apparently the thinking has merits with us independently arriving at similar things
forked plugin model using current streaming subscriptions would get us pretty far without additional protocols
denormalized deltas could provide some additional value, but is it worth one more format? If added I think it should be universally available as a choice in all streaming interfaces

rob42 commented 5 years ago

@tkurki good points, my reasoning: denormalized deltas - yes needs timestamp and $source. I dont see the need for source that can be looked up.

In artemis they are included in the message header (since the current implementation is internal), so they can be referenced.

I include context and $source in the path, so I get vessels.[uuid].navigation.courseOverGroundTrue.values.[$source].value This has proven to work for every case so far and is very easy to parse for security per key.

rob42 commented 5 years ago

forked plugin model - can be used. Note java in particular can run js, python, and others inside the VM, in a security sandbox. Running as a separate or inVM process is really an implementation issue.

Then again I don't see how your proposal is significantly simpler to implement - you still need connection and authentication management, if you are operating over a net connection and not ipc.

Lets consider the simplest case. You down load a plugin and install it on port xxxx. Security rejects it, no data is processed. you read the docs, go into the server admin where you see the plugin on port xxxx identified by a plugin generated uuid. You follow the docs and allow read to paths x,y,z and write to paths a,b (regex allowed). The plugin starts to receive data, and posts replies.

At the plugin side it only needs to read from xxxx, do processing, and write to xxxx. Very simple. Either in a sandbox or as a separate process, its very secure too.

A better plugin keep sending a request for paths x,y,z,a,b until it receives data.

At the server side, its very easy to filter data, especially with paths of the form vessels.[uuid].navigation.courseOverGroundTrue.values.[$source].value using regex.

rob42 commented 5 years ago

One important use case for plugins is intercepting incoming data before the server processes it.

This API would not be useful for interception. The artemis server has 'interceptors' into queues which is the same concept. The NMEA conversion is done by an interceptor for instance, as is security.

Interception is a different API as it deals with raw incoming messages, so they could have many formats. But an api that simply diverted incoming messages to a plugin registered on a port would do this in a flexible way. eg

raw message received
check interceptor list for matches
forward to matching
continue processing result

I think this is roughly what you do in node with piping? Its what I do in artemis with interceptors.

rob42 commented 5 years ago

Efficiency - the reason its more efficient than subscriptions is the processing. Delta messages are first decomposed to a simple key/value format, aka the normalised data above.

For this api they are simply copied to a plugin as is, very little work. The replies are injected as key/values, so no processing either.

For a subscription you need to send every key immediately, or the plugins processing will not be realtime, eg 1s subs will only update data every second. So for every key/value you need to generate a delta and send it. Any replies come back as a delta and need decomposing. If the plugin is monitoring a large number of keys, and those keys update quickly, then the overhead is significant.

Plus the plugin needs to decompose the delta and re-generate the reply as a delta.

tkurki commented 5 years ago

Interception is a different API as it deals with raw incoming messages, so they could have many formats.

I think there is definite value also in a Signal K format interception API, that allows you to ignore or modify incoming data before it enter's server's processing (subscriptions, streaming, full model management etc). This is what's available currently in Node server's plugin API and people have used it successfully to alter and block incoming data, using SK data model independent of the original data source.

To me dealing with raw sensor data, whatever it may be, is not in scope of the Signal K specification and APIs there.

tkurki commented 5 years ago

I dont see the need for source that can be looked up

Agreed, there more for historical reasons than by design.

tkurki commented 5 years ago

You down load a plugin and install it on port xxxx. Security rejects it, no data is processed. you read the docs, go into the server admin where you see the plugin on port xxxx identified by a plugin generated uuid. You follow the docs and allow read to paths x,y,z and write to paths a,b (regex allowed). The plugin starts to receive data, and posts replies.

Sorry, I don't follow you here. This is how a client, be it a sensor or a piece of code deriving some data, would proceed to work with the access request mechanism in the current specification. I don't see what additional mechanism is needed and how it would be less work on the client side than with the current spec?

tkurki commented 5 years ago

I include context and $source in the path, so I get vessels.[uuid].navigation.courseOverGroundTrue.values.[$source].value This has proven to work for every case so far

To me this is sort of saying that you don't need to split data in a relational database to columns because you can encode all data into a single string and extract whatever you want with regular expressions. Sure you can do that, but you need to make all your code aware of the encoding mechanism, instead of providing your API/data structure users with a clear data structure with separate fields for separate values.

While I am no doubt guilty of bias towards http, ws and the way Node server works this really sounds like you are pushing Artemis internal architecture to SK protocols.

tkurki commented 5 years ago

the reason its more efficient ....For this api they are simply copied to a plugin as is, very little work. The replies are injected as key/values, so no processing either.

I agree with your points about decomposing and recomposing deltas. This is how for example subscriptions and an internal deltacache in Node server work.

So are we really looking at denormalized (flat) delta, a 3rd data format that would be useful to add to Signal K? Especially useful in subscriptions and producing data. For example what if a client or a plugin could use the existing subscription mechanism (or some subset of it) and ask for data in denormalized (flat) delta format.

rob42 commented 5 years ago

To me dealing with raw sensor data, whatever it may be, is not in scope of the Signal K specification and APIs there.

by 'raw' I mean deltas as received at the signalk server. The Interception API should be just after NMEA translation etc, so dealing with 'raw' signalk messages.

..include context and $source in the path, so I get vessels.[uuid].navigation.courseOverGroundTrue.values.[$source].value...While I am no doubt guilty of bias towards http, ws and the way Node server works this really sounds like you are pushing Artemis internal architecture to SK protocols.

Actually that format was driven by influxdb needs. But its proven useful elsewhere as it creates a unique path for any data we have in signalk even with multiple values. (Artemis actually holds context, key, and $source in message headers where its filtered with a SQL like syntax as thats faster than reading message body).

For security per key, its been really useful as it allows very efficient matching of keys by regex. It seemed appropriate for this API too. The json object format would work too but I suspect you would then need instantiate the json string to object, and regex against multiple attributes. Probably slower.

rob42 commented 5 years ago

..include context and $source in the path, so I get vessels.[uuid].navigation.courseOverGroundTrue.values.[$source].value...

Either format works for me. Its also easy to convert one to the other, so best simplicity for plugin devs should decide. BTW context has proven awkward with resources etc when there is no uuid. A new format should avoid that.

rob42 commented 5 years ago

So are we really looking at denormalized (flat) delta, a 3rd data format

Yes. We need to be careful to make it as clean and simple as possible, or we will just create the same problem as delta - just creating and decomposing a different format. As a starting point I'd propose

{
    path: (context+path)...,
    value: ...,
    $source: ...,
    timestamp: ...
  }

When subscribed it should send all matching keys, including meta etc. as its needed for use case like anchor watch zones etc. Thats still efficient when using policy instant. Also the subscription needs to support at least wildcards for path for this to work.

SignalK / specification

feature: generic plugin API to make plugins useable on any server #543