claranet / terraform-signalfx-detectors

Collection of terraform modules for SignalFx detectors.
Mozilla Public License 2.0
22 stars 31 forks source link

fix: replace CPU transformation by lasting evaluation #520

Closed Tulux closed 7 months ago

Tulux commented 7 months ago

The min(over='1h') transformation will gather minimal values on a rolling period of 1h which doesn't make sense since this monitor intends to display CPU usage. Plus this transformation happens after splunk time aggregation which means values will vary depending on the displayed period time, ie: a 24h window will likely show lower values than a 7 days period since time aggregation is done with mean().

I think it was an error from the original commit.

pdecat commented 7 months ago

Could you please provide some rationale about this change?

Tulux commented 7 months ago

@pdecat I added note in first comment

pdecat commented 7 months ago

The "idea" of this detector is to raise an alert if the CPU usage never goes down for a long period. We do not want to alert on every spike. Also, using CPU is a good thing as we are paying for it.

If this does not fit your use case, just customize the variable.

Tulux commented 7 months ago

The "idea" of this detector is to raise an alert if the CPU usage never goes down for a long period. We do not want to alert on every spike. Also, using CPU is a good thing as we are paying for it.

This should be done in the detect() part where you can set a period detection of 1h via when() which results in a same behavior. Transformation functions are calculated after detector time aggregation and they also hide the original metric which is a barrier for operational run.

If this does not fit your use case, just customize the variable.

Yep I did, however the point of this PR is to provide a correct default detector which brings original metric visualization.