Pattern: Enable/Disable of plugins

BrianAdams commented 7 years ago

In progress...

We have a single logical application that we are deploying across multiple ROVs products. We want the system to gracefully degrade functionality if the platform that it is deployed on is unable to service features available in the application. Ex, if the system does not have lights, we should not show buttons and options for enabling/disabling lights.

There are a couple of use cases to use to rationalize the design:

Making available the plugins based on the application, cockpit or dropcam
The ROV has or does not have lights
The IMU which was sending accelerometer data switches modes and no longer sends that same data.
Given that a plugin can be spread over multiple layers of the application, how does the entire system enable/disable together

Assumptions:

Our entire system is built on top of decoupled messaging. We should continue to leverage that pattern as much as possible.
If we don't have to enable a plugin because it is not needed on a platform, it would be good not to load it.

Note: We also dived deep in to the concerns regarding throttling of messages. We need to prevent saturating the transport. Today we have node.js managers that simply pick up the topics from the cockpit socket.io connection and place them on/off the rov bus. This allows us to have the node.js manager introduce buffering and caching. The problem is that we have one per plugin and the code is very boiler plate. If instead we do a single channel adapter (rov bus <--> channel adapter <--> socket client), we can have one bit of code that enforces QOS services for all topics that are available. (This does eliminate per plugin managers, but does remove the communication concern from them, which if that was all the plugin manager was doing, eliminates that plugin manager).

Prior Art:

https://github.com/OpenROV/openrov-software/issues/555

Overview:

The devices / device managers are responsible for broadcasting the state of the device on to the bus.
The system should take advantage of the last-value-cached service so that clients that come on-line can interrogate the bus for the state of all devices in the system.

Interrogating the bus for device state

To allow the last-value-cache service to support state propagation of multiple devices, the topic names are namespaced. Since there are potentially multiple of any device type in a vehicle that are sending the same class of messages, we need to be able to namespace by "topic:message:device".

Note: Today we logically are using a single room in socket.io. You can think of each room as an isolated message bus that requires the client to logically connect to it.

This allows the listeners to take advantage of the wild cards support in many platforms. For instance, a system that tracks the status of all lights could subscribe to "light:state:*" to get the state of all light devices.

Regarding the state messages that are sent. If the device has meta data, such a position of light for instance, that meta data should be embedded in the state message. In that way, a client only has to listen to the single state message to get all relevant data about the device.

Every device needs to publish a basic "deviceClass:state" message that can be used to determine if the device is active in the system or not.

Devices should have sub-state messages such as "IMU:state:orientation:deviceid". This is useful for limiting the impact of high rate messages through the system.

As an optimization, if we want to minimize the size of the messages that are sending state around the system, we can design additional BUS services to support subscribing to just the deltas. CDRTs would be a good start for such a system.

Disabling devices

Devices can either fail, or be effectively turned off. In either case we need to be able to pass that state through the system.

In the case that a device is intentionally removed from the system (either turned off, reconfigured as another device, etc...) the device or its agent should update the state message for that device. So in the case of lights, if the light were disabled, we would expect the "light:state:pwm8light" message with a payload such as {device:"na"}. In the contract with the clients, the clients are then responsible for removing anything they have setup to work with that device.

In the case a device fails, it is the responsibility of the agent of the device to manage the health check for the device and to send the updated state message.

We have discussed adding an additional health monitoring service to the bus that simply takes configuration for which messages to monitor as heart beats, how long it can go without a heart beat before signaling the device dead, and the message to then send if the device fails its health check.

System with no lights

Normal light discovery sequence diagram

Discover Lights Sequence Diagram Edit

In the case of an ROV that does not have lights installed, the clients will fire up and listen for any light device state messages, but since none are on the bus, the client never creates the UI elements for interacting with the device.

spiderkeys commented 7 years ago

Some observations on approaches to implementing this, assuming the pub/sub routing table is in place:

Choose a message definition format to use system wide (.idl, .proto, etc)
- Protobufs is a good candidate to start with as we have support for it in all of our employed languages
Create an MCU interface service (can be written in node, c++, whatever)
- Should be started by Cockpit as a service
- Can be networked to Cockpit with websockets, zeromq, etc
Leave the comm interface to the FW the same for right now.
- i.e., leave messages as "field:value;" and "command( ...args );" messages
Have the MCU interface map the FW API to our agreed upon message protocol
After this is working, we can go back whenever we like and rev the FW<->MCU comm protocols since they are private and external to the pub/sub system in cockpit.

spiderkeys commented 7 years ago

Proposed message format:

class ROVHeader
{
   status: number; // 0 - Disposed, 1 - Active, 2 - Inactive
}

class ROVMessage
{
   topic: string;
   header: ROVHeader;
   payload: any;
}

From the firmware's point of view, it will treat individual sensors as a group of topics. To use the IMU as an example with the following state messages:

IMU.State = { id: number, statusCode: number } IMU.Fused = { id: number, roll: number, pitch: number, yaw: number } IMU.Raw = { id: number, mag[ 3 ]: number, acc[ 3 ] : number, gyro[ 3 ] : number } IMU.Mode = { id: number, mode: number } IMU.Cal = { id: number, system: number, acc: number, gyro: number, mag: number }

On the firmware side, we condense the Topic names into the shortest possible descriptions. They do not necessarily need to be human readable, though they can be since topic names are strings. We can use the following convention for topic names when trying to use the "Requester Effector" pattern:

Current state "topic"
Requested state: "topicr"
Target state: "topict"

Firmware pseudocode example:

// Choose an ID
m_id = 3;

// Register the IMU by sending a state message to say we're alive
SERIAL_PRINTF( "is:%d|%d", m_id, EStatus::ALIVE );

// Check for new message
auto msg = NComms::GetMessage();

// Listen for mode switch command (  "imr": "IMU Mode Request" )
if( msg->IsType( "imr" ) && msg->arguments[ 0 ] == m_id  )
{
   // Get requested mode
   int requestedMode = msg->arguments[ 1 ];

   // Send target mode as an acknowledgement ( "imt": "IMU Mode Target" )
   // Sending a negative value could represent an error code. Contextual decision to make
   SERIAL_PRINTF( "imt:%d|%d", m_id, requestedMode );

   // Do a bunch of things that eventually gets us to that mode

   // Whenever a mode change occurs, print the current mode ( "im": "IMU Current Mode" )
   SERIAL_PRINTF( "im:%d|%d", m_id, m_mode );
}

On the MCU manager side, classes can be used to model the devices as implemented on the firmware side. Serialization between the MCU and MCU Interface is a separate concern from the serialization (if any) used on the cockpit bus, but the message information should be translatable between the two.

The MCU interface will translate firmware information into "ROVMessage" instances as described by the Cockpit system bus and handle publishing/subscribing to the available API that the firmware implements. Eventually the MCU interface should also be responsible for best effort reliability in terms of transport over the wire to the MCU. This could be a separate design discussion.

OpenROV / openrov-software