Closed Kkoile closed 1 month ago
The event-queue does detect this. The event-queue fetches all onboarded tenants from cds-mtxs and runs for all of those tenants.
As the event-queue works with setTimeout to schedule jobs internally, it could happen that the job has been scheduled within the defined runInterval
(default 25 min). These errors are all caught and only produce logs. But should not harm anything, it's actually a cosmetic issue. Does this match with your observation? Or are they any other issues with this?
These errors are all caught and only produce logs. But should not harm anything, it's actually a cosmetic issue.
Exactly, it's not impacting anything as they are just logs. But it spams our logs which is unnecessary and could be avoided. Also it leads to the impression that something is not working as it should. Only after the tenant ids have been looked up it is clear that these logs can be ignored.
I understand your point. For this to work, cds-mtxs would need to distribute the offboard event to all instances. However, they unfortunately do not support cross-instance messaging. In the meantime, I could intercept those errors and prevent them from being logged (as you said). I will investigate this further.
Current Plan is to register on offboard events of mtxs and federate this event to all application instances (that's what mtxs should actually be doing - this will be removed as soon as cds has implemented that). All instances will react on this event and cancel all planned events... this won't solve all error messages but will reduce them to a bare minimum.
In our logging system we see a lot of logs like the following:
The mentioned tenant has unsubscribed from our application. Hence, I guess the event-queue plugin does not recgonize if a tenant unsubscribes and still tries to handle events for it until the server instance restarts.
First, I thought simply hooking into the mtxs event handlers would be good enough, but in a multi instance setup one would need to distribute the unsubscription among all instances.
So I think the best approach would be to catch the authentication failed error when connecting to the DB and check against the service manager, whether the tenant still exists.