argoproj / notifications-engine

Affordable notifications for Kubernetes
Apache License 2.0
264 stars 134 forks source link

Prevent slack Update policy from posting new messages #298

Open AndreiPetrusMihai opened 1 month ago

AndreiPetrusMihai commented 1 month ago

The issue:

At the moment, a message with a policy of Update can post a new message on Slack. This happens when the message gets sent for a groupingKey which doesn't have a recorded timestamp, which happens when there was no previous message posted for the respectivegrouping key.

This is not correct and it is quite misleading since one would expect a policy of Update to never be able to actually post new messages. It also means that there is no practical difference between the PostAndUpdate and Update policies when it comes to sending a message for a groupingKey which didn't have any previous message recorded.

It's important to know that just because a timestamp wasn't recorded, it doesn't mean that a message with a certain groupingKey wasn't previously sent. The dictionary of groupingKey: timestamp is kept in-memory, so upon a complete engine restart, these records would get lost.

This could be considered a breaking change if someone relied on the Update policy to post new messages. It could also be considered a fix if the correct behavior of Update is to never post a new message.


The use-case/scenario with which this behavior was found:

We have multiple argo apps and we want to receive notifications when an error occurs. This would mean notifications for failed syncs, maybe degraded apps, etc.

At the moment this is doable, but it would be a bit hard to keep track of which apps were fixed and which weren't since the error messages are static. Even if the error for an app is now fixed, the error notification still stays in the slack channel, unchanged.

As a way to improve this experience, we want to do the following:

Having these 2 notifications would basically mean that errors would get posted to the channel, and once fixed, the error messages could be updated to reflect that the issues has been solved. This makes it much easier to follow and keep note of errors that still need fixing.

At the moment this doesn't work correctly. The successful sync messages do update existing error messages, but they also get posted when there is no corresponding error message for them.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 55.27%. Comparing base (f485671) to head (def4988). Report is 3 commits behind head on master.

:exclamation: Current head def4988 differs from pull request most recent head 15a938c

Please upload reports for the commit 15a938c to get more accurate results.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #298 +/- ## ========================================== - Coverage 55.35% 55.27% -0.08% ========================================== Files 35 35 Lines 3438 3439 +1 ========================================== - Hits 1903 1901 -2 - Misses 1256 1258 +2 - Partials 279 280 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

AndreiPetrusMihai commented 3 weeks ago

Hey @pasha-codefresh, could you maybe take a look at this PR when you have some spare time? Not sure who else to ping. Thanks!