FlowFuse / flowfuse

Connect, collect, transform, visualise, and interact with your Industrial Data in a single platform. Use FlowFuse to manage, scale and secure your Node-RED solutions.
https://flowfuse.com
Other
279 stars 63 forks source link

Scalability: Broker ACL checks #2794

Open knolleary opened 1 year ago

knolleary commented 1 year ago

Description

Part of #2782

Unlike the other items under the scalability banner, this item is more about easing some pressure on the forge app, rather than improving its scalability.

The main 'background' load on the forge app is handling the Device Agent check-ins. There are two parts to this:

  1. The ACL check made by the broker to verify the device is allowed to publish
  2. Handling the status update - updating database

We already have request caching in the broker auth plugin to try to minimise the load here - but given the load we are observing in production, there could be room for tuning here to easy some of the pressure. We still need to do the acl checks, but by increasing some of the caching settings in the broker we can reduce some of the pressure. We do have a bit of a 'thundering herd' issue as the work is driven by the devices when they connect, and the recurring checkins happen at an interval from that point.

We have added some jitter to the device-agent checkin times - we should look at increasing the jitter range to improve the spread of time the messages arrive in.

There are some other options such as creating a custom auth plugin for mosquitto that can do some more localised checking without having to hit the forge app. For example, the Project nodes publish to a topic structure of ff/v1/<team>/p/<project>/out/+/#. If a project node publishes to ff/v1/<team>/p/<project>/out/foo/1 and ff/v1/<team>/p/<project>/out/foo/2 that will drive two ACL checks. In reality, we only need to do a check for the stem of the topic (ff/v1/<team>/p/<project>/out/) - a custom auth plugin could deal with that locally and allow the local caching to match for both requests. (I've described this badly... but I know what I mean...).

Epic/Story

2782

Have you provided an initial effort estimate for this issue?

I have provided an initial effort estimate

ZJvandeWeg commented 7 months ago

@knolleary I seem to remember this task had been completed already, can you confirm?

knolleary commented 7 months ago

@ZJvandeWeg as the issue describes, this was more focussed on easing the pressure on the forge app by making our existing ACL checks more efficient. It is not a prerequisite to horizontal scaling of the app. When we looked at the load the ACL checks generate more closely, we decided they were currently manageable, so didn't need any immediate action whilst we worked on address the blockers to horizontal scaling the app.

So this issue is still open, but remains on the backlog as I don't believe we need it in the short term. But they remain valid options to pick up if the need arises.