Closed lswith closed 2 years ago
I'm wondering what happens when tmCount
is 0 on this line: https://github.com/SumoLogic-Labs/slogen/blob/59d58d9c9ac440a88675755e333d1b800f9bde43/libs/monitor.go#L66
I think it should be guarded against.
Looking at my specific monitoring query it seems that the .Budget
isn't being filled with 0.9
but is 0
instead.
_view=xxx
| timeslice 6h
| sum(sliceGoodCount) as tmGood, sum(sliceTotalCount) as tmCount group by _timeslice
| fillmissing timeslice(1m)
| tmGood/tmCount as tmSLO
| (tmCount-tmGood) as tmBad
| total tmCount as totalCount
--> HERE --> | totalCount*(1-0) as errorBudget
--> HERE --> | ((tmBad/tmCount)/(1-0)) as sliceBurnRate
| if(queryEndTime() - _timeslice <= 6h,sliceBurnRate, 0 ) as latestBurnRate
| sum(tmGood) as totalGood, max(totalCount) as totalCount, max(latestBurnRate) as latestBurnRate
| (1-(totalGood/totalCount))/(1-0) as longBurnRate
| if (longBurnRate > 1 , 1,0) as long_burn_exceeded
| if ( latestBurnRate > 1, 1,0) as short_burn_exceeded
| long_burn_exceeded + short_burn_exceeded as combined_burn
This would actually fix the query and I can see that the the burn rate would exceed the value of 1.
Hi team, I've been using your tool extensively and I am loving it!
thnx @lswith, its very encouraging for us to know it being useful.
I'm wondering what happens when
tmCount
is 0 on this line:I think it should be guarded against.
Looking at my specific monitoring query it seems that the
.Budget
isn't being filled with0.9
but is0
instead._view=xxx | timeslice 6h | sum(sliceGoodCount) as tmGood, sum(sliceTotalCount) as tmCount group by _timeslice | fillmissing timeslice(1m) | tmGood/tmCount as tmSLO | (tmCount-tmGood) as tmBad | total tmCount as totalCount --> HERE --> | totalCount*(1-0) as errorBudget --> HERE --> | ((tmBad/tmCount)/(1-0)) as sliceBurnRate | if(queryEndTime() - _timeslice <= 6h,sliceBurnRate, 0 ) as latestBurnRate | sum(tmGood) as totalGood, max(totalCount) as totalCount, max(latestBurnRate) as latestBurnRate | (1-(totalGood/totalCount))/(1-0) as longBurnRate | if (longBurnRate > 1 , 1,0) as long_burn_exceeded | if ( latestBurnRate > 1, 1,0) as short_burn_exceeded | long_burn_exceeded + short_burn_exceeded as combined_burn
This would actually fix the query and I can see that the the burn rate would exceed the value of 1.
great catch on both, will fix and create a new release after testing them out by end of the day.
hey @lswith, made an attempt to fix both .Budget
not being set and tmCount
being 0 in v0.7.10 and tested it for a few configs.
please let me know if you still face the issue after upgrading to the new version. And mega thnx for reporting this critical bug and the cause along with it :)
Just confirmed. This fixed the issues with alerting! Thanks again
Hi team, I've been using your tool extensively and I am loving it!
I have come across an issue with the monitoring of an SLO.
My current alerting configuration is as follows:
When I evaluate the SLO over a 24h period, it is currently at 98.41897 (which is below the 99 to meet the SLO).
I would have expected that I would receive at least 1 email stating that this SLO is not being met, however all the monitors generated aren't being triggered.
I'm wondering if the calculation of one of these items may be incorrect?
Current version: There is no slogen command to output the version, but I'm pointing to the latest of the main branch.