Open knolleary opened 1 year ago
@knolleary I seem to remember this task had been completed already, can you confirm?
@ZJvandeWeg as the issue describes, this was more focussed on easing the pressure on the forge app by making our existing ACL checks more efficient. It is not a prerequisite to horizontal scaling of the app. When we looked at the load the ACL checks generate more closely, we decided they were currently manageable, so didn't need any immediate action whilst we worked on address the blockers to horizontal scaling the app.
So this issue is still open, but remains on the backlog as I don't believe we need it in the short term. But they remain valid options to pick up if the need arises.
Description
Part of #2782
Unlike the other items under the scalability banner, this item is more about easing some pressure on the forge app, rather than improving its scalability.
The main 'background' load on the forge app is handling the Device Agent check-ins. There are two parts to this:
We already have request caching in the broker auth plugin to try to minimise the load here - but given the load we are observing in production, there could be room for tuning here to easy some of the pressure. We still need to do the acl checks, but by increasing some of the caching settings in the broker we can reduce some of the pressure. We do have a bit of a 'thundering herd' issue as the work is driven by the devices when they connect, and the recurring checkins happen at an interval from that point.
We have added some jitter to the device-agent checkin times - we should look at increasing the jitter range to improve the spread of time the messages arrive in.
There are some other options such as creating a custom auth plugin for mosquitto that can do some more localised checking without having to hit the forge app. For example, the Project nodes publish to a topic structure of
ff/v1/<team>/p/<project>/out/+/#
. If a project node publishes toff/v1/<team>/p/<project>/out/foo/1
andff/v1/<team>/p/<project>/out/foo/2
that will drive two ACL checks. In reality, we only need to do a check for the stem of the topic (ff/v1/<team>/p/<project>/out/
) - a custom auth plugin could deal with that locally and allow the local caching to match for both requests. (I've described this badly... but I know what I mean...).Epic/Story
2782
Have you provided an initial effort estimate for this issue?
I have provided an initial effort estimate