Open ghost opened 3 years ago
Hello @chungktran
I understood the issue and this is legitimate to ask.
That said, the current configuration is an intended opinionated implementation. That doesn't mean it can't change but this is a tricky one and before go further I need to deeply understand the final use case behind and why the current behavior is not enough.
for now the severity of an alert follows the notification binding policy. so an alert has an unchanging severity, that is true, but its notification destination can change depending on the team / user.
if we take again your example, the system-common module has a cpu detector using unchangeable critical
/major
severity couple. It can make sens for static infrastructure like on premise but this can be undesirable for dynamic ones like cloud IAAS where cpu is not that important.
supposing you use have different notifications services as standard like pagerduty for oncall and slack for notification so you can bind:
notifications
variable to respect this standard (critical/major to pager et lower severity to slack)cpu_notifications
variable to slack instead of pager to override the standard defined by the previous configThe alert will still has the same severity for true but there is no oncall raised, only a non disupritve slack message because it is not important in this case. I understand this does not address directly your issue but it can serve the same purpose (depending on yours obviously).
In addition to the complexity it can involve to make this dynamic, keep it static is also a way to standardize a recommended severity (and its notification destination) and force the contributor to think about the most relevant severity when it creates new detectors.
please tell me if this notification binding makes sens for you and else why it cannot work for you, thanks.
@xp-1000 Thanks for your feedback. I'll see if I can have my requirements met by doing what you suggested. If you don't mind; please keep this request open for now.
@xp-1000 I reviewed the variables and see if I can make what you suggested works for us; unfortunately it doesn't.
The reason for my use case is this. I have a team in OpsGenie that takes all alerts and make alert routing rules based on Priority. Priority
in OpsGenie is the severity
key in SignalFx detectors.
With the current design of the detectors I have to create a different team in OpsGenie for CRIT and MAJOR alerts because the detectors don't allow the severity
key to be changed.
With my proposal I could have just one OpsGenie team and make CRIT detectors send Major
for severity
instead of being forced to send Critical
for severity
. So, with the ability to change the severity
key in the detectors I can ultimately use OpsGenie to do all the routing decisions.
hello @chungktran
Ok thank your for the feedback!
As I said before, the way we create / template our detectors now are opinionated and "per severity per rule" oriented so as you can imagine this will be difficult to make this fully customizable.
For example, I think about the variable naming itself which contain the severity rule name (e.g. my_detector_threshold_critical
).
If I right understood your use case, if you are able to change the severity attribute only: https://registry.terraform.io/providers/splunk-terraform/signalfx/latest/docs/resources/detector#severity this should be good for you, am I right ?
This will be a little weird because you will still have to use variables of the original severity for a detector/rule where you changed to another one. (e.g. changing a severity rule from critical
to major
will be possible but if you want to change the threshold you still will to update my_detector_threshold_critical
variable).
Is this enough for you ?
honestly make the whole thing fully dynamic and able to change including variables names will be hard and may be the only solution is to use terraform cdk to enjoy all features from a real language.
Is your feature request related to a problem? Please describe. Currently detectors program text publishes CRIT, MAJOR, or WARN strings (I do not see INFO or DEBUG); which in turn used by
detect_label
in therule
s to send out notifications. I would like to propose to turn those into strings variables so that a CRIT can be overridden to be a MAJOR for example. The reason for this request is that an event may be CRIT for one team does not necessarily means it's also a CRIT for another team. A good example for this is thesmart-agent_system-common
detectors.Describe the solution you'd like Using the
smart-agent_system-common
as an example.In this file:
https://github.com/claranet/terraform-signalfx-detectors/blob/master/modules/smart-agent_system-common/variables-gen.tf
, define extra parameters for each detector. For example,heartbeat_crit_value
,heartbeat_major_value
, andheartbeat_warn_value
. Majority of the modules will only need the first two extra parameters.Then in the module's detector tf file,
https://github.com/claranet/terraform-signalfx-detectors/blob/master/modules/smart-agent_system-common/detectors-gen.tf
, update theprogram_text
of each resource to.publish('${var.heartbeat_crit_value}')
or.publish('${var.heartbeat_major_value}')
.With the changes above a CRIT event can be turned into any event instead by just overriding the variable.
Describe alternatives you've considered I have read through the modules and have not found a solution for this request.