KjellWolf / CW-Zabbix-Templates

Zabbix Templates for MinIO
2 stars 0 forks source link

Add readme to explain configuration #1

Closed GhaziTriki closed 3 weeks ago

GhaziTriki commented 3 months ago

Hello,

We want to use this in Zabbix 7.0. Could you please explain the configuration in the README file?

KjellWolf commented 3 months ago

Hey,

Sure. I can see if I can do this tonight. I have tested it with Zabbix 6.4 and 7.0. I cannot guarantee that everything will work perfectly as I have just had our first mini-cluster. '

But TLDR. import the yaml to the templates in Zabbix and an these macros need to be populated on the hosts {$MINIO.DNS.NAME} {$MINIO.DNS.WILDCARD} {$S3.DRIVES.PER.ERASURE.SET} {$S3.MINIO.ACCESS.KEY} {$S3.MINIO.API.URL} {$S3.MINIO.MAX.DEAD.DRIVES} # this will be replaces in a future version. because i want to calculate the max dead drives via the parity and setsize metrics {$S3.MINIO.PARITY}

GhaziTriki commented 3 months ago

Great @KjellWolf, we have 5 bare-metal cluster, with 1 load balancer among them. I would be happy to be your beta-tester 😁

KjellWolf commented 3 months ago

Ive updates the comment it seems i have made the change already... :D If you have ideas for Triggers let me know too.

And maybe nice to know is that this uses the Prometheus metrics

KjellWolf commented 3 months ago

Added a small Readme. If Something isnt clear, just hit me up.

GhaziTriki commented 2 months ago

I tried now but I got "Cannot perform request: Received HTTP/0.9 when not allowed". Can the readme be clearer?

KjellWolf commented 2 months ago

Hey

i dont know exactly at wich step this comes but i think its about the {$S3.MINIO.API.URL} Macro

did you specify https. Like https://my.loadbalancer.dev ? Can you share as much as possible about the setup?

when i unserstand hwere this comes from I`m happy to update the Readme.

GhaziTriki commented 2 months ago

Currently I have the following cluster

image

Let's assume the following:

Based on that I put the config macros:

What would be the right configuration?

KjellWolf commented 2 months ago

Oh I think now i Know where i cloud have messed up.

in my example config i install the Template on Each Node. (While i think a solution to run LB / Standalone only would be prefeerable)

So a conf for one node wuld be {$MINIO.DNS.NAME} = node1.minio.example.dev:9000 # specified with the internalö communication port Default 9000 {$MINIO.DNS.WILDCARD} = node1* {$S3.MINIO.ACCESS.KEY} = Here is a difference. I do not run Prometheus directly on the nodes. i got mine with this command mc admin config get [alias] prometheus {$S3.MINIO.API.URL} = I setup with http / https at the start

Drives Per Set and Parity are not important for the connection. just for the calculations. but it still should work.

I just tore down my testminio for a server swap. Will start testing soon. But hope this helps?

GhaziTriki commented 2 months ago

It is a good idea to monitor each node separately. Maybe the wording of the docs wasn't clear. I have Prometheus installed on the load balancer. However it makes sense to do it like you did. Let me give a try.

KjellWolf commented 2 months ago

Yea so can every node monitor metrics from the other one.

Just the Triggers will alert for every node, Like if 1 Drive Fails, a 4 node cluster, zabbix will be alertig 4times (for each node)

GhaziTriki commented 2 months ago

Looks better now. A question: {$S3.MINIO.API.URL} = I setup with http / https at the start, you mean the HTTPS URL of the node itself?

The following metrics are failing

Minio S3 Software Commit Info (Hash) MinIO S3 Cluster Objects Size Distribution BETWEEN_1_MB_AND_10_MB

MinIO S3 Cluster Objects Size Distribution BETWEEN_10_MB_AND_64_MB MinIO S3 Cluster Objects Size Distribution BETWEEN_64_KB_AND_256_KB MinIO S3 Cluster Objects Size Distribution BETWEEN_64_MB_AND_128_MB MinIO S3 Cluster Objects Size Distribution BETWEEN_128_MB_AND_512_MB MinIO S3 Cluster Objects Size Distribution BETWEEN_256_KB_AND_512_KB MinIO S3 Cluster Objects Size Distribution BETWEEN_512_KB_AND_1_MB MinIO S3 Cluster Objects Size Distribution BETWEEN_1024_B_AND_1_MB MinIO S3 Cluster Objects Size Distribution BETWEEN_1024_B_AND_64_KB MinIO S3 Cluster Objects Size Distribution GREATER_THAN_512_MB MinIO S3 Cluster Objects Size Distribution LESS_THAN_1024_B MinIO S3 Cluster Objects Version Distribution BETWEEN_2_AND_10 MinIO S3 Cluster Objects Version Distribution BETWEEN_10_AND_100 MinIO S3 Cluster Objects Version Distribution BETWEEN_100_AND_1000 MinIO S3 Cluster Objects Version Distribution BETWEEN_1000_AND_10000 MinIO S3 Cluster Objects Version Distribution GREATER_THAN_10000 MinIO S3 Cluster Objects Version Distribution SINGLE_VERSION MinIO S3 Heal Time Last Activity

It would be also nice to add a tag to all the metrics with key "component" and value "minio" to easily filter them.

GhaziTriki commented 2 months ago

Just for your eyes looks perfect !

image

I will open another ticket for improvements.

KjellWolf commented 2 months ago

API URL means the LB in my setup. bc my zabbix cant reach the intercom vlan network of the minio setup. but to call the nodes directly (with fitting port) should work to. But no gurantee at this point.

Adding the tags is a WIP, there i give you als right to complain. but wasnt a priority until now.

Sometimes the Template need 3-4 round to get all data. Some data gets createt over a time period or plainly just does not exists. therefore i would need an export of all Prometheus data generated to check it.

I`m happy for further imrovements. Ill keep this issue Open for the Tags and Making the Readme clearer. THX for the input do far :)

KjellWolf commented 3 weeks ago

Long time no see! Sorry got really ill.

Have now added the component:minio tag.

Hope this resolves this conversation!