FlowFuse / flowfuse

Connect, collect, transform, visualise, and interact with your Industrial Data in a single platform. Use FlowFuse to manage, scale and secure your Node-RED solutions.
https://flowfuse.com
Other
274 stars 63 forks source link

MQTT Device Management #754

Closed sammachin closed 2 years ago

sammachin commented 2 years ago

Epic

464

Description

As a: User

I want: my devices to communicate with the forge application over MQTT

So that: I have a realtime channel and do not rely on polling.

This is dependent on #464 delivering the MQTT broker infrastructure.

Dependencies

Acceptance Criteria

sammachin commented 2 years ago

Do we want to keep polling as a fallback mechanism?

knolleary commented 2 years ago

Do we want to keep polling as a fallback mechanism?

I think so. I wouldn't choose to rip it out straight away as we'll have a period of 'legacy' device agents out there that only know to do polling.

sammachin commented 2 years ago

@ Does this still need its own design work or can it be estimated and scheduled (tentatively) for development in 0.8?

knolleary commented 2 years ago

It does require some design work, although I think much of the unknowns are fairly well understood at this point.

Happy for this to go in to 0.8

knolleary commented 2 years ago

updated 21/7 to add project to status payload and update topics used Lets start to fill out the design details for how devices will make use the broker.

From #464 we have defined the top level topic structures.

Status

The Device status event has the form:

{
   state: '<state>',
   snapshot: '<snapshot>',
   settings: '<settings>',
   health: {
      uptime: 123,
      snapshotRestartCount: 1
   }
}

This is the same structure as the existing HTTP Ping - which we need to keep consistent with. We haven't properly specified the health properties and formalised how they are used. Will need to come back to that.

It publishes status events whenever there is a change in the local status, including:

The precise values of state are TBD.

Commands

Commands published by the platform are JSON blobs containing the type of command and any additional meta data the particular command provides.

Currently, the only command the platform may send the device is a notification there is a new project snapshot to load.

update

{
   "command": "update",
   "project": "<project-id>",
   "snapshot": "<snapshot-id>",
   "settings": "<settings-hash>"
}

When the device receives this command it must compare the snapshot and settings values with its locally stored values. If either differs, then:

  1. publish an updating status message
  2. call the corresponding HTTP endpoint to get the new snapshot/settings
  3. apply the new values
  4. restart node-red
  5. publish a running status message
knolleary commented 2 years ago

This is proving to be more involved than simply adding MQTT instead of HTTP.

With the current topic structure design each device subscribes to its own 'command' topic. But the most common command to send is that the snapshot a device should be using has changed - and that has to be sent to all devices. This means the platform has to get a list of all devices and publish a message to each one. Having implemented it, it feels wrong - that's a lot of unnecessary work.

It would be better if the platform could publish one message to notify all devices assigned to the project of any change. But to achieve that, the devices would have to subscribe to a topic specific to the project - which in turn means:

I have updated the topic table in https://github.com/flowforge/flowforge/issues/464#issuecomment-1155300019 to reflect this.

Device MQTT lifecycle

  1. Agent starts. Sees a broker config has been provided so enables the MQTT handler. If no broker config is provided, it will fallback to HTTP polling.
  2. Agent publishes to ff/v1/<team>/d/<device>/status - with its current snapshot/settings hashes (project can be inferred from snapshot) - with a state of <to be defined>. This is the 'birth' message. It does not start the project running at this point in time.
  3. When the platform receives a device status message, it validates the snapshot/settings hashes are correct. If the state is <to be defined>, or the snapshot/settings hashes are wrong, it publishes an update message to the device command topic - this includes the project id.
  4. When the agent receives an update message - on either its .../d/... or .../p/... topic, it compares the snapshot/settings/project with its local configuration.
    1. if everything matches, it starts the project if not running.
    2. if project has changed,
      1. unsubscribe from old project command topic
      2. stop launcher and delete old project
    3. if project not null, if snapshot/settings changed
      1. get settings/snapshot from platform
      2. if project changed, subscribe to new project command topic
      3. start new project
Event Response
Device is added/removed from a project Platform publishes to ff/v1/<team>/d/<device>/command to notify the device
Device settings modified (env vars) Platform publishes to ff/v1/<team>/d/<device>/command to notify the device
Target Snapshot is changed (including when deleted) Platform publishes to ff/v1/<team>/p/<project>/command to notify all devices
Project deleted Platform publishes to ff/v1/<team>/p/<project>/command to notify all devices (with project: null)
ZJvandeWeg commented 2 years ago

@sammachin Can this issue be updated? I think the milestone is incorrect and it has been fully delivered?