Closed oxzi closed 2 months ago
After talking with @sukhwinder33445 about this change as he implements the web counterpart, I have removed the incr_cfg_delete
function and its trigger usage, rewriting a DELETE
to an UPDATE
setting deleted = 'y'
on the main configuration tables.
For historical reasons and if it might become relevant again, I have attached the diff below.
My thinking when suggesting to not have changed_at
/deleted
in every config table but only in the main ones was a bit that should also be the natural thing to do in Web. For example, if you add an recipient somewhere in an escalation, you do this on the edit event rule page and you click save there, so there it would only be necessary to update rule.changed_at
when the form is saved, all the changes to other tables would be covered implicitly by that.
Now this turned in to quite a bit of complex trigger mechanics instead. I would definitely consider adding these two columns to all tables instead of requiring three variants of the trigger functions. Is everything you implemented in PostgreSQL triggers also possible with MySQL and MariaDB?
as well as raising the
deleted
column instead of actually performing theDELETE
command.
~Does raising have any special meaning in this context? Or is it simply supposed to say SET deleted = 'y'
?~
~Also, I'm not yet sure how good the idea is to hook DELETE
in a way that it doesn't actually delete, i.e. might this be too surprising so that this could result in unexpected behavior? The only downside I see with executing an UPDATE ... SET deleted = 'y'
instead would be that with the current current distinction into main/child config tables, you'd have to make a difference between those. If we'd decide to add the columns to every table, this would no longer be the case. (But on the other hand, there would then be a difference between adding a new group membership and reviving a previously deleted one.)~
Edit: this part became irrelevant by itself, I wrote this comment before seeing the previous comment.
I would say that if for example the rule
is updated via form
(rule.name ="UpdatedName"), the changed_at
column should be updated automatically. Maridb/Mysql can do this with ON UPDATE. If an INSERT/UPDATE/DELETE takes place in the related (child) tables, the web should (can) update the changed_at
column of the main(parent) table manually.
I would say that if for example the
rule
is updated viaform
(rule.name ="UpdatedName"), thechanged_at
column should be updated automatically.If an INSERT/UPDATE/DELETE takes place in the related (child) tables, the web should (can) update the
changed_at
column of the main(parent) table manually.
Why do you want to make a distinction here? Would that make anything easier for web?
Maridb/Mysql can do this with ON UPDATE.
Does this even work here? Reads like this is a magic feature just for TIMESTAMP
and DATETIME
which we don't use.
Why do you want to make a distinction here? Would that make anything easier for web?
Yes. For example, if the rule name is changed, the form returns new values that we can write directly to $db->update('rule', $formValues);
. But now $formValues also needs the column changed_at
. Adding this column is no big deal, but if db can do it itself, why should we do it manually. This will also prevent future errors if someone forgets to add this column to the $formValues.
Does this even work here? Reads like this is a magic feature just for TIMESTAMP and DATETIME which we don't use.
But changed_at
is handled as timestamp
.
Update:
I thought it would be simpler than the changes on the web, but it's not, and it's getting more complex with the triggers. And the data type is not a real timestamp
. I can understand your reasoning now.
Web should do all that manually, and the triggers can be removed.
While the triggers allowed me to offload a lot of complexity to the database, it now went back to the notification daemon. I hope that I have found most of the obvious bugs that can occur, especially during partial updates. This version now works "trigger-free" and has the two columns in all tables, including relationship tables. Please give it a try and report back.
Small note, mostly for future me: I have just rebased against the current main
, nothing more. I am going to address @yhabteab's comments later. Furthermore, I talked with @julianbrost about the ConfigSet
, which could contain every table/relation instead of groups of relations, easing fetching logic and allowing more unification.
TODO: Fetch each table by its own. This "flattening" should ease the synchronization and might even allow more code reusage.
Are the sql
files missing by mistake, or have you decided for a new implementation?
I assume that all tables mentioned here and the referenced(child) tables get the deleted
and changed_at
flag (as in the recently removed sql
file).
The implementation in the web is based on this assumption.
@sukhwinder33445: Sorry, the schema and an update are now part of the work in progress commit.
You have defined json tags for all ChangedAt and Deleted fields, but seem to ignore them completely in all update function hooks. Is that intentional? If so, then the json tags are probably useless.
I have completed only half of the changes so far! But what about this question?
@yhabteab: I have already removed the json
tags as they were, as you mentioned, useless. My initial idea was to refer to them within the JSON dump, but as most other fields are missing such a tag, I removed them here as well.
You have defined json tags for all ChangedAt and Deleted fields, but seem to ignore them completely in all update function hooks. Is that intentional? If so, then the json tags are probably useless.
I have completed only half of the changes so far! But what about this question?
@yhabteab: I have already removed the
json
tags as they were, as you mentioned, useless. My initial idea was to refer to them within the JSON dump, but as most other fields are missing such a tag, I removed them here as well.
I mean, okay, you removed the json tag completely and my comment didn't even propose to remove the tags at all, but rather if you think these fields should also be included in the config dump, you need to update them regularly like the other fields, otherwise just exclude them via json:"-"
. Currently, they should still appear in the config dump with false-positive values, since both fields are exported.
I have tried to work on all of your comments and closed those discussions I though I have properly addressed. Please feel free to reopen those, if you think otherwise.
Furthermore, I made sure that the changed_at
value is also always being updated, as @yhabteab pointed out.
Please take another look and sorry for the broken diff as I needed to rebase the code :/
Next to addressing @julianbrost's comments, I have introduced two additional changes:
utils.Flag
helper-type.Rotation.Priority
was changed from int32
to sql.NullInt32
as the column is defined as NULL
able in the schema and, as it happened, somehow became NULL
in my testing setup, resulting in SQL type errors and a broken synchronization.I have changed the detection of initially deleted elements to cover this special case of the latest change being a deletion. Regarding the log entry flood of created elements, I reconsidered the current log levels and have degraded everything except errors and the summary after changes to DEBUG.
I was only able to get the error again by manually editing some deleted rotation members in the database.
2024-07-10T11:00:42.886+0200 ERROR runtime-updates Creating configuration element failed {"table": "rotation_member", "id": 30, "new": {"id": 30, "rotation_id": 16, "contact_group_id": 1}, "error": "creation callback error, schedule rotation member refers unknown rotation 16"}
Did editing involve setting deleted = 'n'
again on that row? If yes, that's not something that's really intended to be done (or if you do, you'd at least have to undelete all other dependencies as well). Otherwise, that might be a bug as you shouldn't be able to end up in the "Creating configuration element" case with a deleted = 'y'
row.
I was only able to get the error again by manually editing some deleted rotation members in the database.
2024-07-10T11:00:42.886+0200 ERROR runtime-updates Creating configuration element failed {"table": "rotation_member", "id": 30, "new": {"id": 30, "rotation_id": 16, "contact_group_id": 1}, "error": "creation callback error, schedule rotation member refers unknown rotation 16"}
Did editing involve setting
deleted = 'n'
again on that row? If yes, that's not something that's really intended to be done (or if you do, you'd at least have to undelete all other dependencies as well). Otherwise, that might be a bug as you shouldn't be able to end up in the "Creating configuration element" case with adeleted = 'y'
row.
The actual error from the "creation callback [states, that the] schedule rotation member refers unknown rotation 16". The creation callback cannot[^1] be called upon an deleted = 'y'
element. Thus, I would guess that you have manually revived a Rotation Member belonging to a still deleted Rotation? Otherwise, something would be wrong.
[^1]: Should not. I really hope so, at least.
Thus, I would guess that you have manually revived a Rotation Member belonging to a still deleted Rotation?
That's exactly what I mean when I wrote "by manually editing". Btw. just updated my local branch for the web part and now it is allowing me to delete event rules and noticed this noise.
2024-07-10T15:34:15.325+0200 WARN incident Incident refers unknown rule {"object": "dummy-7!onesecond-0", "incident": "#125", "rule_id": 2}
If you don't drop that faulty rule from incident#Rules
, it will always complain about that, and I'm not sure how to handle that properly. Sorry, I didn't notice that previously!
Btw. just updated my local branch for the web part and now it is allowing me to delete event rules and noticed this noise.
2024-07-10T15:34:15.325+0200 WARN incident Incident refers unknown rule {"object": "dummy-7!onesecond-0", "incident": "#125", "rule_id": 2}
If you don't drop that faulty rule from
incident#Rules
, it will always complain about that, and I'm not sure how to handle that properly. Sorry, I didn't notice that previously!
Good catch that I was able to reproduce.
There are a few relationship relations between the incident
relation - not directly addressed by this PR as it is no configuration - and relations being touched by this PR.
rule
via incident_rule
contact
, contactgroup
and schedule
via incident_contact
rule_escalation
via incident_rule_escalation_state
, pointing even further to incident_history
Furthermore, there is a link between the source
relation and object
, with object
being referred by incident
and event
.
On the other side, there is also available_channel_type
being referred by channel
, with the latter having the changed_at
and deleted
columns.
I am currently taking a look how to reflect deletions from the configuration tables there.
Following new fancy :sparkles: changes were introduced:
changed_at
was dropped as requested https://github.com/Icinga/icinga-notifications/pull/191#discussion_r1673643514 to support older MariaDB and MySQL versions :older_man:
Previously, the entire configuration stored in the database was synchronized every second. With the growth of configurations in live environments on the horizon, this would simply not scale well. This brings us to incremental updates.
By introducing two new columns - "changed_at" as a Unix millisecond timestamp and "deleted" as a boolean - for all tables referenced in the ConfigSet structure, SQL queries can be modified to retrieve only those rows with a more recent timestamp. The "deleted" column became necessary to detect disappearances, since the synchronization now only takes newer items into account. Some additional fields needed to be added to the ConfigSet to track relationships.
Even though the codebase served well at the time, there was some code that did almost the same thing as other code, just in different ways. So a huge refactoring was done. This resulted in an internal generic function that handles all synchronization with custom callbacks.
The web counterpart is being developed in https://github.com/Icinga/icinga-notifications-web/pull/187.
Closes #5.