sammachin commented 2 years ago

Epic

464

Description

As a: User

I want: my devices to communicate with the forge application over MQTT

So that: I have a realtime channel and do not rely on polling.

This is dependent on #464 delivering the MQTT broker infrastructure.

Dependencies

[x] #706
[x] https://github.com/flowforge/flowforge-device-agent/pull/22

Acceptance Criteria

[x] Update Forge application to communicate over MQTT
[x] Update Device agent code to use MQTT

sammachin commented 2 years ago

Do we want to keep polling as a fallback mechanism?

knolleary commented 2 years ago

Do we want to keep polling as a fallback mechanism?

I think so. I wouldn't choose to rip it out straight away as we'll have a period of 'legacy' device agents out there that only know to do polling.

sammachin commented 2 years ago

@ Does this still need its own design work or can it be estimated and scheduled (tentatively) for development in 0.8?

knolleary commented 2 years ago

It does require some design work, although I think much of the unknowns are fairly well understood at this point.

Happy for this to go in to 0.8

knolleary commented 2 years ago

updated 21/7 to add project to status payload and update topics used Lets start to fill out the design details for how devices will make use the broker.

From #464 we have defined the top level topic structures.

Devices will subscribe to ff/v1/<team>/d/<device>/command to receive commands from the platform
Devices will publish to ff/v1/<team>/d/<device>/status to send their status to the platform
- [x] ~Include team hashid in the credentials object they are provided~ The broker username includes the teamId - we can extract it from that rather than add yet another field to the credentials object.
Devices will still use the existing HTTP endpoint to download project snapshots and settings - these will not be sent over MQTT.
Devices will subscribe to ff/v1/<team>/p/<project>/command to receive commands from the platform sent to all devices for a give project

Status

The Device status event has the form:

{
   state: '<state>',
   snapshot: '<snapshot>',
   settings: '<settings>',
   health: {
      uptime: 123,
      snapshotRestartCount: 1
   }
}

This is the same structure as the existing HTTP Ping - which we need to keep consistent with. We haven't properly specified the health properties and formalised how they are used. Will need to come back to that.

It publishes status events whenever there is a change in the local status, including:

When the device agent first starts (with a random delay between 1-10 secs to avoid reconnection storms)
Any unexpected change in Node-RED state
Before/after updating the local snapshot/settings

The precise values of state are TBD.

Commands

Commands published by the platform are JSON blobs containing the type of command and any additional meta data the particular command provides.

Currently, the only command the platform may send the device is a notification there is a new project snapshot to load.

`update`

{
   "command": "update",
   "project": "<project-id>",
   "snapshot": "<snapshot-id>",
   "settings": "<settings-hash>"
}

When the device receives this command it must compare the snapshot and settings values with its locally stored values. If either differs, then:

publish an updating status message
call the corresponding HTTP endpoint to get the new snapshot/settings
apply the new values
restart node-red
publish a running status message

knolleary commented 2 years ago

This is proving to be more involved than simply adding MQTT instead of HTTP.

With the current topic structure design each device subscribes to its own 'command' topic. But the most common command to send is that the snapshot a device should be using has changed - and that has to be sent to all devices. This means the platform has to get a list of all devices and publish a message to each one. Having implemented it, it feels wrong - that's a lot of unnecessary work.

It would be better if the platform could publish one message to notify all devices assigned to the project of any change. But to achieve that, the devices would have to subscribe to a topic specific to the project - which in turn means:

devices need to know what project id they are assigned to (they don't currently get that info - just the snapshot id)
- [ ] Add that to the status object returned by the deviceLive end-point and the 'update' messages
- [ ] Store that information in the device project file
- [ ] ...
devices need to dynamically subscribe/unsubscribe from the project topic
the acl handling needs to consider whether a device is allowed to subscribe to a project status topic. We could relax that to say a device in a team can subscribe to any project status topic.

I have updated the topic table in https://github.com/flowforge/flowforge/issues/464#issuecomment-1155300019 to reflect this.

Launchers (which haven't been implemented yet) will now use ff/v1/<team>/l/<project>/command and ff/v1/<team>/l/<project>/status - (note the /l/ to indicate this is for the launcher).
The topic ff/v1/<team>/p/<project>/command (note the /p/) is used to send commands to all devices assigned to this project.

Device MQTT lifecycle

Agent starts. Sees a broker config has been provided so enables the MQTT handler. If no broker config is provided, it will fallback to HTTP polling.
Agent publishes to ff/v1/<team>/d/<device>/status - with its current snapshot/settings hashes (project can be inferred from snapshot) - with a state of <to be defined>. This is the 'birth' message. It does not start the project running at this point in time.
When the platform receives a device status message, it validates the snapshot/settings hashes are correct. If the state is <to be defined>, or the snapshot/settings hashes are wrong, it publishes an update message to the device command topic - this includes the project id.
When the agent receives an update message - on either its .../d/... or .../p/... topic, it compares the snapshot/settings/project with its local configuration.
1. if everything matches, it starts the project if not running.
2. if project has changed,
  1. unsubscribe from old project command topic
  2. stop launcher and delete old project
3. if project not null, if snapshot/settings changed
  1. get settings/snapshot from platform
  2. if project changed, subscribe to new project command topic
  3. start new project

Event	Response
Device is added/removed from a project	Platform publishes to `ff/v1/<team>/d/<device>/command` to notify the device
Device settings modified (env vars)	Platform publishes to `ff/v1/<team>/d/<device>/command` to notify the device
Target Snapshot is changed (including when deleted)	Platform publishes to `ff/v1/<team>/p/<project>/command` to notify all devices
Project deleted	Platform publishes to `ff/v1/<team>/p/<project>/command` to notify all devices (with `project: null`)

ZJvandeWeg commented 2 years ago

@sammachin Can this issue be updated? I think the milestone is incorrect and it has been fully delivered?

FlowFuse / flowfuse

MQTT Device Management #754

Epic

464

Description

Dependencies

Acceptance Criteria

Status

Commands

`update`

Device MQTT lifecycle