Altinity / clickhouse-grafana

Altinity Grafana datasource plugin for ClickHouse®
MIT License
726 stars 120 forks source link

Alerts not working #384

Closed vasylkolomiets closed 2 years ago

vasylkolomiets commented 3 years ago

Hello. I have grafana 8.2.2 and clickhouse-grafana plugin 2.3.1 For example on this setup alerting not working example1 example1_logs

Automatically not working but when i try to test rule in UI and after exiting with saving changes(no changes were made) condition triggered. example2 example2_logs

Slach commented 3 years ago

Why you expect alerts should work without save dashboard or press test rule? As I see, this is expected behavior for Grafana itself which not depends on clickhouse-grafana data source.

There are screenshots with PostgreSQL data source which also not triggered before pressed Test rule or save image

for your logic alert should trigger image

But even after test rule alert not triggered image

It triggered only after some time for PostgreSQL only after press save dashboard image

vasylkolomiets commented 3 years ago

Hi @Slach For example alerting works for elasticsearch datasource with a similar setting. I expect that it will work automatically without my intervention if in the last 5 minutes there is at least one entry in the clickhouse.

In my case, I set up a dashboard with one critical log graph from one service. Basically it is No data in response, but sometimes a critical log can come and an alert should send a notification about this, but it does not work.

Slach commented 3 years ago

In my case, I set up a dashboard with one critical log graph from one service. Basically it is No data in response, but sometimes a critical log can come and an alert should send a notification about this, but it does not work.

Maybe I miss something, could you to provide more context? Did you press save dashboard and alert not triggered as you expected even though data is present?

But picture which you provide in https://user-images.githubusercontent.com/72972027/142634001-aec5fbcd-4b0b-4eff-9b17-5f6c03c35c87.png show alert triggered

vasylkolomiets commented 3 years ago

@Slach this is settings of my alerts https://user-images.githubusercontent.com/72972027/142634001-aec5fbcd-4b0b-4eff-9b17-5f6c03c35c87.png in this screenshot, the alert worked after saving the dashboard, but it was created long before that and did not work, although it should have worked earlier based on the conditions, the save procedure launched the condition and from the screen you can see that in 5 minutes it should have come ok but this did not happen.

After all editing and saving my alerts don't work exampe3

In the clickhouse data source, I have a log table and, for example, the logs of one from my services are sent there. I want the alert to be triggered if at least one record was in the last 5 minutes, this is a critical log that can come only once a day and it should be caught.

After my observations, I determined that the alert condition is triggered only after saving the dashboard and it looks like a bug.

Slach commented 2 years ago

@vasylkolomiets i tried to reproduce behavior for your pattern it works as expected image

Could you try to try custom compiled plugin? https://mega.nz/file/rFhiHDCY#uIhuxVx0siVJZXF3hfkHnNArU-xYdsbULn-4WWoY_YI

Just unpack zip file into /var/lib/grafana/plugins/vertamedia-clickhouse-datasource

vasylkolomiets commented 2 years ago

Hi @Slach I observe the same problematic behavior. Maybe you can test it with my settings? My clickhouse table:

CREATE TABLE logs1.test
(
    `timestamp` DateTime,
    `host` String,
    `filename` String,
    `message` String
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (timestamp, host, filename, message)
SETTINGS index_granularity = 8192

My dashboard jsom model:

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "target": {
          "limit": 100,
          "matchAny": false,
          "tags": [],
          "type": "dashboard"
        },
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 38,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "alert": {
        "alertRuleTags": {},
        "conditions": [
          {
            "evaluator": {
              "params": [
                0
              ],
              "type": "gt"
            },
            "operator": {
              "type": "and"
            },
            "query": {
              "params": [
                "A",
                "5m",
                "now"
              ]
            },
            "reducer": {
              "params": [],
              "type": "sum"
            },
            "type": "query"
          }
        ],
        "executionErrorState": "keep_state",
        "for": "0m",
        "frequency": "1m",
        "handler": 1,
        "name": "test-alert",
        "noDataState": "ok",
        "notifications": []
      },
      "datasource": "logs-s1",
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 0,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "auto",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom"
        },
        "tooltip": {
          "mode": "single"
        }
      },
      "targets": [
        {
          "dateTimeType": "DATETIME",
          "extrapolate": true,
          "format": "time_series",
          "formattedQuery": "SELECT $timeSeries as t, count() FROM $table WHERE $timeFilter GROUP BY t ORDER BY t",
          "intervalFactor": 1,
          "query": "SELECT\n    t,\n    groupArray((log, messages)) AS ga\nFROM (\n\n    SELECT\n        toUInt32(timestamp) * 1000 as t,\n        count(message) as messages,\n        concat(host, ' ', filename) as log\n    FROM logs1.test\n    WHERE\n       $from<=toUInt32(timestamp) and toUInt32(timestamp)<=$to\n        and toDateTime(t / 1000)> toDateTime($from) and toDateTime(t / 1000)<toDateTime($to)\n    GROUP BY\n        t,\n        log\n    ORDER BY\n        t,\n        log\n)\nGROUP BY t, log\nORDER BY t",
          "rawQuery": "SELECT\n    t,\n    groupArray((log, messages)) AS ga\nFROM (\n\n    SELECT\n        toUInt32(timestamp) * 1000 as t,\n        count(message) as messages,\n        concat(host, ' ', filename) as log\n    FROM logs1.test\n    WHERE\n       1637577380<=toUInt32(timestamp) and toUInt32(timestamp)<=1637577680\n        and toDateTime(t / 1000)> toDateTime(1637577380) and toDateTime(t / 1000)<toDateTime(1637577680)\n    GROUP BY\n        t,\n        log\n    ORDER BY\n        t,\n        log\n)\nGROUP BY t, log\nORDER BY t",
          "refId": "A",
          "round": "0s",
          "skip_comments": true
        }
      ],
      "thresholds": [
        {
          "colorMode": "critical",
          "op": "gt",
          "value": 0,
          "visible": true
        }
      ],
      "title": "Panel Title",
      "type": "timeseries"
    }
  ],
  "refresh": "",
  "schemaVersion": 32,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "test",
  "uid": "MIZ8V4tnz",
  "version": 16
}

I inserted data to my table and grafana rendered a graph but alerts do not work. insert into logs1.test (*) values (now(),'testHost','testFile1','message2222')

Maybe I'm doing something wrong or the problem is more complicated.

Slach commented 2 years ago

Looks like your query is too complicated to backend time range parser

SELECT
    t,
    groupArray((log, messages)) AS ga
FROM (
    SELECT
        toUInt32(timestamp) * 1000 as t,
        count(message) as messages,
        concat(host, ' ', filename) as log
    FROM logs1.test
    WHERE
       $from<=toUInt32(timestamp) and toUInt32(timestamp)<=$to
        AND toDateTime(t / 1000)> toDateTime($from) and toDateTime(t / 1000)<toDateTime($to)
    GROUP BY
        t,
        log
    ORDER BY
        t,
        log
)
GROUP BY t, log
ORDER BY t

I propose to replace

    WHERE
       $from<=toUInt32(timestamp) and toUInt32(timestamp)<=$to
        AND toDateTime(t / 1000)> toDateTime($from) and toDateTime(t / 1000)<toDateTime($to)

to

    WHERE
       timestamp >= toDateTime($from) AND timestamp<= toDateTime($to)

thereafter, press save dashboard and wait alert trigger state changing

Slach commented 2 years ago

feel free to make comment in issue if still reproduce after changing, I will re-open issue