carlosedp / cluster-monitoring

Cluster monitoring stack for clusters based on Prometheus Operator
MIT License
739 stars 200 forks source link

FEATURE: Add Speedtest monitor module and custom dashboard. #116

Closed Cian911 closed 2 years ago

Cian911 commented 3 years ago

This PR adds a new module and grafana dashboard to monitor your internet speed. A lot of the work is thanks to the speedtest-exporter project which can be found here.

I've added a default scrape time of 30 minutes, and a scrape timeout of 2 minutes, but these can be configured as needed.

Dashboard in Operation

Cian911 commented 2 years ago

Hey @carlosedp, do you think you might have time to review this?

radicalgeek commented 2 years ago

Hi @Cian911

I am really interested in this. I have downloaded the PR, and after enabling the module and running make vendor and make, I can't see any reference to it in the manifests?

Cian911 commented 2 years ago

Hi @Cian911

I am really interested in this. I have downloaded the PR, and after enabling the module and running make vendor and make, I can't see any reference to it in the manifests?

You have to enable it in vars.jsonnet, then run make, the same way as the other modules :)

It's disabled by default

radicalgeek commented 2 years ago

Hi @Cian911 I am really interested in this. I have downloaded the PR, and after enabling the module and running make vendor and make, I can't see any reference to it in the manifests?

You have to enable it in vars.jsonnet, then run make, the same way as the other modules :)

It's disabled by default

Yeah I did that. What I forgot to do in my excitement was save the file! Works a charm, just waiting for the first scrape now. nice work thank you.

Cian911 commented 2 years ago

Awesome and thank you @radicalgeek!

Cian911 commented 2 years ago

Hey again @carlosedp, just pinging you again in case you haven't seen this yet. Would love to get this reviewed by yourself

carlosedp commented 2 years ago

Sorry, gonna review it this weekend. Thanks for the PR.

On Fri, Oct 22, 2021, 12:54 Cian Gallagher @.***> wrote:

Hey again @carlosedp https://github.com/carlosedp, just pinging you again in case you haven't seen this yet. Would love to get this reviewed by yourself

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/carlosedp/cluster-monitoring/pull/116#issuecomment-949755320, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAE7HQ5Z6JNTHFEWE2LGB3UIGCKBANCNFSM5FUGUOCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Cian911 commented 2 years ago

Casual bump again @carlosedp :)

carlosedp commented 2 years ago

Sorry about the huge delay! Thanks!

ToMe25 commented 2 years ago

@Cian911 For me this does not work at all atm. This is because it references a datasource named DS_PROMETHEUS, however according to grafana the only data source that I have is named prometheus. Replacing this using the grafana interface generates a json file that is almost identical to yours, with the exception of the replaced data source. Is there a reason for why the data source has to be DS_PROMETHEUS? If there isn't, should I just make a PR with the modified grafana dashboard? Edit: I would also recommend adding a 15 minute auto refresh :)

Cian911 commented 2 years ago

@ToMe25 DS_PROMETHEUS is an acronym for, basically datasource prometheus.

"__inputs": [
    {
      "name": "DS_PROMETHEUS",
      "label": "Prometheus",
      "description": "",
      "type": "datasource",
      "pluginId": "prometheus",
      "pluginName": "Prometheus"
    }
]

You're going to have to elaborate further on the problem you're having. Have you modified/updated your grafana deployment recently?

Edit: I would also recommend adding a 15 minute auto refresh :)

You can already do this if you make a small modification to the module here: https://github.com/carlosedp/cluster-monitoring/blob/master/modules/speedtest_exporter.jsonnet#L49 It's just set as a default at the moment. I could make a PR to allow it to be configurable via vars file if that would be easier.

ToMe25 commented 2 years ago

@Cian911 My grafana deployment that I tested this with is pretty much the default grafana deployment created by this repository.

Have you modified/updated your grafana deployment recently?

I have pulled the recent changes to this repo and updated my grafana deployment by running make and make deploy, after enabling this in the vars file. This however means that my grafana version is 7.0.3, while yours seems to be 7.2.0, maybe that makes a difference?

You can already do this if you make a small modification to the module here: https://github.com/carlosedp/cluster-monitoring/blob/master/modules/speedtest_exporter.jsonnet#L49 It's just set as a default at the moment. I could make a PR to allow it to be configurable via vars file if that would be easier.

I meant a grafana refresh, which is currently completely disabled. As far as I can tell that line only defines how often the actual measurement takes place, but doesn't do anything to the grafana dashboard.

This would be the modified version of the dashboard that I would suggest(generated with grafana 7.0.3):

Dashboard.json ```json { "annotations": { "list": [ { "builtIn": 1, "datasource": "-- Grafana --", "enable": true, "hide": true, "iconColor": "rgba(0, 211, 255, 1)", "name": "Annotations & Alerts", "type": "dashboard" } ] }, "editable": true, "gnetId": null, "graphTooltip": 0, "links": [], "panels": [ { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "prometheus", "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] }, "fill": 1, "fillGradient": 10, "gridPos": { "h": 9, "w": 8, "x": 0, "y": 0 }, "hiddenSeries": false, "id": 4, "legend": { "alignAsTable": true, "avg": true, "current": false, "max": true, "min": true, "rightSide": false, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true, "dataLinks": [] }, "percentage": false, "pluginVersion": "7.0.3", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": true, "targets": [ { "expr": "speedtest_ping_latency_milliseconds", "format": "table", "instant": false, "interval": "", "legendFormat": "Ping (ms)", "refId": "A" }, { "expr": "speedtest_jitter_latency_milliseconds", "format": "table", "instant": false, "interval": "", "legendFormat": "Jitter (ms)", "refId": "B" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Ping and Jitter (ms)", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:291", "format": "ms", "label": "Time", "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:292", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "prometheus", "description": "", "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] }, "fill": 1, "fillGradient": 10, "gridPos": { "h": 9, "w": 8, "x": 8, "y": 0 }, "hiddenSeries": false, "id": 2, "legend": { "alignAsTable": true, "avg": true, "current": false, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true, "dataLinks": [] }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": true, "targets": [ { "expr": "speedtest_download_bits_per_second*10^-6", "format": "table", "instant": false, "interval": "", "legendFormat": "Download Speed (Mbits/s)", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Download Speed (Mbits/s)", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:291", "format": "Mbits", "label": "Download Speed", "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:292", "decimals": null, "format": "dateTimeAsLocal", "label": "", "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } }, { "aliasColors": {}, "bars": false, "dashLength": 10, "dashes": false, "datasource": "prometheus", "fieldConfig": { "defaults": { "custom": {} }, "overrides": [] }, "fill": 1, "fillGradient": 10, "gridPos": { "h": 9, "w": 8, "x": 16, "y": 0 }, "hiddenSeries": false, "id": 3, "legend": { "alignAsTable": true, "avg": true, "current": false, "max": true, "min": true, "show": true, "total": false, "values": true }, "lines": true, "linewidth": 1, "nullPointMode": "null", "options": { "alertThreshold": true, "dataLinks": [] }, "percentage": false, "pluginVersion": "7.2.1", "pointradius": 2, "points": false, "renderer": "flot", "seriesOverrides": [], "spaceLength": 10, "stack": false, "steppedLine": true, "targets": [ { "expr": "speedtest_upload_bits_per_second*10^-6", "format": "table", "interval": "", "legendFormat": "Upload Speed (Mbits/s)", "refId": "A" } ], "thresholds": [], "timeFrom": null, "timeRegions": [], "timeShift": null, "title": "Upload Speed (Mbits/s)", "tooltip": { "shared": true, "sort": 0, "value_type": "individual" }, "type": "graph", "xaxis": { "buckets": null, "mode": "time", "name": null, "show": true, "values": [] }, "yaxes": [ { "$$hashKey": "object:291", "decimals": null, "format": "Mbits", "label": "Upload Speed", "logBase": 1, "max": null, "min": null, "show": true }, { "$$hashKey": "object:292", "decimals": null, "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true } ], "yaxis": { "align": false, "alignLevel": null } } ], "refresh": "15m", "schemaVersion": 25, "style": "dark", "tags": [], "templating": { "list": [] }, "time": { "from": "now-6h", "to": "now" }, "timepicker": {}, "timezone": "", "title": "Speedtest Dashboard", "uid": "-fs18ztMz", "version": 1 } ```
Cian911 commented 2 years ago

@ToMe25 I will take a look, it could be as a consequence of the version change as you suggest.

I meant a grafana refresh, which is currently completely disabled. As far as I can tell that line only defines how often the actual measurement takes place, but doesn't do anything to the grafana dashboard.

I see, yes I can create a PR to do this.

This would be the modified version of the dashboard that I would suggest(generated with grafana 7.0.3)

This shows up as an invalid dashboard when I try to import it.

ToMe25 commented 2 years ago

@Cian911

This shows up as an invalid dashboard when I try to import it.

It does for me too, weird. Running make and make deploy again with the broken one worked(and applied the changes after the next grafana restart), so I didn't test importing it, seems like that was a mistake. There seems to have been a comma at one place where there shouldn't have been one, idk how it got there tho. I updated my comment above, now importing that one works for me.

Also the legend names are somehow broken after a grafana restart even tho they weren't before, idk what to do about that.

Cian911 commented 2 years ago

@ToMe25 The updated json works for me now also when I import it.

Running make and make deploy again with the broken one worked(and applied the changes after the next grafana restart), so I didn't test importing it, seems like that was a mistake. There seems to have been a comma at one place where there shouldn't have been one, idk how it got there tho. I updated my comment above, not importing that one works for me.

Are you saying this is working for you now, just to be clear?

Also the legend names are somehow broken after a grafana restart even tho they weren't before, idk what to do about that.

I don't know why this would be, I haven't experienced this issue myself.

ToMe25 commented 2 years ago

@Cian911

Are you saying this is working for you now, just to be clear?

Yes, I meant "now importing that one works", the "not" was a typo.

I don't know why this would be, I haven't experienced this issue myself.

Since restarting grafana doesn't fix it, and deleteing and readding the query doesn't fix it either, I have no idea what to do about that, so I will ignore it instead. It says "Value #A" and "Value #B" instead of the legend text.

ToMe25 commented 2 years ago

@Cian911 Did you find any issues with my dashboard changes? If not I would probably make a PR for that tomorrow. The legend issue seems to be that my grafana(I assume grafana 7.0.3 in general) doesn't show the legend correctly in the table display mode. So I can't fix that. However that is broken in some other default dashboards as well iirc, so I'm not that bothered by that.

ToMe25 commented 2 years ago

I found a way to fix the legend issue :)