Improve horizontal scalability of core forge app

knolleary commented 9 months ago

Description

As we build experience of running FF with an increasing workload, we need to look at how it will continue to scale.

The most immediate solution to scaling is to run two instances of the app with load-balancing in front to distribute the work. However there are a few blockers to being able to do this.

This epic is where we will identify and document them to spin off separate tasks to address them.

- [ ] https://github.com/flowforge/flowforge/issues/2514
- [ ] https://github.com/flowforge/flowforge/issues/2792
- [ ] https://github.com/flowforge/flowforge/issues/2793
- [ ] https://github.com/FlowFuse/flowfuse/pull/3417
- [ ] https://github.com/FlowFuse/flowfuse/pull/3418
- [ ] https://github.com/FlowFuse/flowfuse/issues/3332
- [ ] https://github.com/FlowFuse/flowfuse/issues/3367
- [ ] https://github.com/FlowFuse/flowfuse/issues/3426
- [x] ~Scalability: SSO session state management~ no changes needed
- [ ] https://github.com/FlowFuse/helm/issues/355
- [ ] https://github.com/FlowFuse/flowfuse/issues/3642
- [ ] https://github.com/FlowFuse/flowfuse/issues/3847
- [ ] Test scalability on staging

### Future Tasks - not critical path
- [ ] https://github.com/flowforge/flowforge/issues/2794

hardillb commented 9 months ago

With K8s ingress we can direct different paths to different backend instances if needed

Steve-Mcl commented 9 months ago

NOTE: The following is a very early idea, barely fleshed out and ultimately, may not be viable for many reasons but I wanted to share early doors in case there is any merit in it and helps us avoid travelling one path, only to change direction when it doesn't solve the issues.

I had thought about this (for different reasons) some time ago. My thought experiment was to add layers between the core fuse application and the devices. Lets call it a "FlowFuse Agent" or "FFA" for ease of discussion.

The FFA would be able to support 1 ~ n connections - n being a soft target that is determined for best ratio of performance vs distribution.

There could be 1 or multiple FFAs (for horizontal scaling)

The FFAs aggregate / tunnel comms to the cloud / master Fuse app

Supporting on-prem means the majority of traffic (especially for multiple devices pulling snapshot, multiple device tunnels) stays on site 9or in the case of cloud, stay off the main Fuse App)

These FFAs could/should provide resilience (e.g. run them at 75% design capacity, if one fails, the others take over)

The FFAs would host the MQTT and ACLs reducing the hits on the Fuse App to mere 1st time loading (and dynamic updates)

While out of scope at this point, for complete vision of the approach:

The FFAs could live on cloud or on-prem
Could perform other duties like proxying, caching NPM modules, MQTT pass through to neighbouring FFAs (for shortcut / on-prem traffic), etc.
Could potentially be instance runners (K8s nodes)
Could be a saleable item for corporate or enterprise who want extra resilience, extra security (by running FFAs local to each Dept)

--

This architecture is quite common in Manufacturing where hundreds of tools on the shop floor communicate with a local aggregator on the internal (private) network. The aggregator orders and streams data to its parent, etc.

For on-site Eng/IT/IS, the obvious benefits are security/simplification (not exposing devices to internet/no special VLANs/proxies/network provisioning), resilience against internet outage.

For FF app, benefits come in the form of much reduced traffic and reduced connections.

knolleary commented 9 months ago

I'm going to split this out into separate stories for the four topics identified - all linked from the task list above.

hardillb commented 9 months ago

One we missed, Database model updates need to only run on one instance or we get race conditions.

ZJvandeWeg commented 3 months ago

@MarianRaphael @joepavitt Can you update this epic? I think there's some issues and things missing.

knolleary commented 3 months ago

@ZJvandeWeg this was on me to update from our conversation last week. Will make it so.

joepavitt commented 3 months ago

From a scheduling perspective, when do we expect these outstanding tasks to be worked on @MarianRaphael @knolleary?

FlowFuse / flowfuse

Improve horizontal scalability of core forge app #2782

Description