SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.93k stars 1.23k forks source link

Alert has issues when use time shift & formula #6151

Open chensunny opened 1 week ago

chensunny commented 1 week ago

Bug description

Please describe.
If this affects the front-end, screenshots would be of great help.

Alert has issues when use time shift & formula likes. it works well in Dashboard but wrong is alert. image

Expected behavior

How to reproduce

image

Version information

Additional context

I think because Dashboard use v4 api and Alert use v3 API (edited) Can you change the alert to V4 API??

Thank you for your bug report – we love squashing them!

ahmadshaheer commented 1 week ago

@chensunny, can you please share the response formats in case of alerts and dashboards?

@srikanthccv, can you please help me how to reproduce this issue based on the data in staging environment?

chensunny commented 1 week ago

https://signoz.automizely.org/api/v1/testRule

another issue, it will test fail if there is a multiplication or a departure in formula !!!

(B-A)/A
srikanthccv commented 1 week ago

signoz.automizely.org/api/v1/testRule

another issue, it will test fail if there is a multiplication or a departure in formula !!!

(B-A)/A

@chensunny please be a little more descriptive about the problem. It's not possible to understand what you mean by this.

srikanthccv commented 1 week ago

can you please help me how to reproduce this issue based on the data in staging environment?

I will share loom video.

kobecal commented 6 days ago

It's easy to reproduce this issue. Here's an example process to reproduce.

  1. A query timeshift by 86400, groupby any field
  2. B query is same config with A query except for timeshift.(No need timeshift in B query)
  3. F1: B-A

Then, you will find the F1 result is wrong. Three reasons result in this issue. First, the query should bev4/query_range, v3/query_range is totally wrong if we calculate the week-over-week day-over-day data. Add "version":"v4"to alert rule should fix the issue in panel. But it's not fixed in alert. The second reason is that the query A will not prepare ShiftBy field. So A and B query will request same time range data. Assign a correct ShiftBy should the query issue. For now. The issue issue is still not fixed. The third reason is postProcess functions need to align timeshift A query result. Then the A and B query result can be calculated at same timestamp value. timeShift functions value should updated to 86400 from "86400"

kobecal commented 4 days ago

The PR https://github.com/SigNoz/signoz/pull/6209 can only fix the alert panel issue. But the alert with timeshift functions is still not good, because the timeshift is not calculated correctly when evaluating alert rule in backend.

srikanthccv commented 4 days ago

The third reason is postProcess functions need to align timeshift A query result

https://github.com/SigNoz/signoz/blob/204728ff60346c906dbf13b7a53d99bf585a2be4/pkg/query-service/app/queryBuilder/functions.go#L312

https://github.com/SigNoz/signoz/blob/204728ff60346c906dbf13b7a53d99bf585a2be4/pkg/query-service/postprocess/process.go#L36

srikanthccv commented 4 days ago

But the alert with timeshift functions is still not good, because the timeshift is not calculated correctly when evaluating alert rule in backend

What do you mean by this? Please elaborate!

kobecal commented 1 day ago

The third reason is postProcess functions need to align timeshift A query result

https://github.com/SigNoz/signoz/blob/204728ff60346c906dbf13b7a53d99bf585a2be4/pkg/query-service/app/queryBuilder/functions.go#L312

https://github.com/SigNoz/signoz/blob/204728ff60346c906dbf13b7a53d99bf585a2be4/pkg/query-service/postprocess/process.go#L36

The default timeshift functions value is string. And ApplyFunctions will take it as 0. For example, the timeShift functions configs are as follow: {"functions":[{"name":"timeShift","args":["604800"]}]} The "604800" is not work for alert. {"functions":[{"name":"timeShift","args":[604800]}]} The int 604800 should work for alert.

kobecal commented 1 day ago

But the alert with timeshift functions is still not good, because the timeshift is not calculated correctly when evaluating alert rule in backend

What do you mean by this? Please elaborate!

{"functions":[{"name":"timeShift","args":[604800]}]} The query range API will process the timeShift function and calculate the time shift value and assign to ShiftBy. According to the ShiftBy value, the start and end time of the query will be re-calculated. But when running alert rule, The ShiftBy is always 0. Because the timeShift function is never executed. So A and B query will get same result even their timeShift functions configs are different.

srikanthccv commented 1 day ago

You are correct. We will fix that bug as well.