chvvkumar / Monitoring

Monitor ESXi, Synology, Docker, PiHole and Raspberry Pi and Windows using Grafana, InfluxDB and Telegraf
GNU General Public License v3.0
641 stars 62 forks source link

Questions - Synology Docker #2

Closed n1nj4888 closed 6 years ago

n1nj4888 commented 6 years ago

First off, this is a great write-up on the TIG stack and how to monitor a number of devices! I have a few questions that I'd like to ask if possible?

(1) Can you add some detail as to how you configured Telegraf, InfluxDB and Grafana? For example, was it based on your own setup or an existing tutorial? (2) Do you host the Telegraf/InfluxDB/Grafana stack as docker containers on the Synology NAS itself (and if so, which images did you use?) or are you hosting this on another device? (3) For the "Monitoring Docker" section, when you add the following lines, can you provide more info on the "synology.lan" address? Should this be enterred exactly as "synology.lan" (and if so, what does this resolve to?) or should it be enterred as something like "SYNOLOGY NAS LAN IP ADDRESS" (i.e. 192.168.1.x")?

admin@DiskStation:~$ cat /var/packages/Docker/etc/dockerd.json { "hosts" : [ "tcp://synology.lan:2375", "unix:///var/run/docker.sock" ], "registry-mirrors" : [] }

(4) Also, given that the above method makes a change to the synology Docker package config, do you know whether this is overwritten when new Synology Docker package releases are installed?

Many thanks!

chvvkumar commented 6 years ago

Thank you!

(1) Can you add some detail as to how you configured Telegraf, InfluxDB and Grafana? For example, was it based on your own setup or an existing tutorial?

Telegraf was straightforward. I just followed the official documentation here: Telegraf

(2) Do you host the Telegraf/InfluxDB/Grafana stack as docker containers on the Synology NAS itself (and if so, which images did you use?) or are you hosting this on another device?

I did host all them as docker containers on the NAS. But I have since moved to a dedicated docker host (a VM running ubuntu) since I wanted to play with docker compose and Kubernetes. Here is my docker compose for this new environment: Docker compose

This allows me to deploy the whole environment in one go and not have to configure each individual container separately. I still have influxdb running as a container on the NAS, though. The reason is that the NAS and the Raspberry Pi that runs telegraf is backed by a UPS but the VMhost (Dell R210II) is not. I wanted the database to be up and available for telegraf even if the other containers are down.

(3) For the "Monitoring Docker" section, when you add the following lines, can you provide more info on the "synology.lan" address? Should this be enterred exactly as "synology.lan" (and if so, what does this resolve to?) or should it be enterred as something like "SYNOLOGY NAS LAN IP ADDRESS" (i.e. 192.168.1.x")?

admin@DiskStation:~$ cat /var/packages/Docker/etc/dockerd.json { "hosts" : [ "tcp://synology.lan:2375", "unix:///var/run/docker.sock" ], "registry-mirrors" : [] }

Synology.lan is the hostname of my NAS on my internal network. I believe an IP address should work just the same.

(4) Also, given that the above method makes a change to the synology Docker package config, do you know whether this is overwritten when new Synology Docker package releases are installed?

Atleast in my case, updating the docker package broke that part of the config. This is one of the reasons I moved to a dedicated docker host.

n1nj4888 commented 6 years ago

Thanks for the great answers! I have some more (noob!) questions if I may?

(1) You use a single telegraf.conf file for all your measurement configs, using the same “telegraf” database within influxDB - Given all the measurements write to the single telegraf DB name, how do you know that measurements from device 1 (say your ASUS Router) aren’t “polluting” the measurements logged for Device 2 (the Synology NAS)? I tried to get around this potential issue by configuring separate telegraf.conf files in “/etc/telegraf/telegraf.d/“ named say telegraf-router.conf (with the inputs being only your customised Asus Router sections and the output being a DB named “telegraf-router” in InfluxDB) and telegraf-synology.conf (with the inputs being only your customised Synology sections and the output being a DB named “telegraf-synology” in InfluxDB)... Strangely, both the telegraf-synology and telegraf-router databases in InfluxDB seem to contain the measurements for both (including measurements called snmp.SYNO and snmp.rt66n

(2) After copying your Synology dashboard (again, awesome!) and editing the panels/JSON, all seem to show the correct metrics in grafana apart from the System Uptime Panel which shows around 2 days - Running “uptime” on an SSH session direct to the synology NAS indicates 15+ days so trying to figure out whether there maybe an issue with the dashboard or the metrics?

chvvkumar commented 6 years ago

Sure!

(1) You use a single telegraf.conf file for all your measurement configs, using the same “telegraf” database within influxDB - Given all the measurements write to the single telegraf DB name, how do you know that measurements from device 1 (say your ASUS Router) aren’t “polluting” the measurements logged for Device 2 (the Synology NAS)? I tried to get around this potential issue by configuring separate telegraf.conf files in “/etc/telegraf/telegraf.d/“ named say telegraf-router.conf (with the inputs being only your customised Asus Router sections and the output being a DB named “telegraf-router” in InfluxDB) and telegraf-synology.conf (with the inputs being only your customised Synology sections and the output being a DB named “telegraf-synology” in InfluxDB)... Strangely, both the telegraf-synology and telegraf-router databases in InfluxDB seem to contain the measurements for both (including measurements called snmp.SYNO and snmp.rt66n

The measurements are written to the same database but they are written with a 'property' along with the measurement so to speak. This property can be any thing, from time stamp or a value or a string. In this case, each measurement has an 'agent_host' property that describes the origin of that particular measurement.

You can easily see this in two ways:

  1. In grafana, if you look at the panel properties, you can specify a host to show measurements from by specifying an 'agent_host' value like so (for this particular implementation. As you will see in the second example, this can change) :

image

This will restrict that panel data to the host 192.168.1.5 even if there are measurements from multiple hosts in the 'snmp.SYNO' database.

  1. Using InfluxDB Studio, if you run a quiery against a database where multiple hosts dump data into, you can see the 'engine_host' values alternate between docker1 and DiskStation (my two docker hosts). So, even if the two hosts have the same parameters measured, their 'engine_host' values makes it easy to tell these measurements apart.

image

Generally, you will have one telegraf.conf / telegraf instance. This is because there can only be one config file that telegraf can use at any instant. If you need to use two config files, you will have to run a second telegraf instance using this second config file. Generally though, you don't need to in a home environment. I have all my stuff running off of a single telegraf instance.

(2) After copying your Synology dashboard (again, awesome!) and editing the panels/JSON, all seem to show the correct metrics in grafana apart from the System Uptime Panel which shows around 2 days - Running “uptime” on an SSH session direct to the synology NAS indicates 15+ days so trying to figure out whether there maybe an issue with the dashboard or the metrics?

Likely an issue with calculation. Since uptime is stored in influxdb in the form of unix time and I use grafana to convert this to a readable format.

Check to make sure you are using the correct value for uptime (i.e. sysUpTime within snmp.SYNO) and the conversion and display settings like so:

image

image