centreon / centreon-plugins

Collection of standard plugins to discover and gather cloud-to-edge metrics and status across your whole IT infrastructure.
https://www.centreon.com
Apache License 2.0
310 stars 274 forks source link

Issue with data from nimble volumes #4256

Closed kernhsab closed 1 year ago

kernhsab commented 1 year ago

Hi,

We have a strange problem with the monitoring some nimble volumes.

In fact , we have 4 Volumes and the data are correct on volumes NVMFS-01 and NVMFS 03 but on NVMFS-02 and NVMFS-04 the only correct value returned is the space usage. All the others information remains zero.

Here is an example by requesting NVMFS-01 : /usr/lib/centreon/plugins/centreon_plugins.pl --plugin=storage::nimble::restapi::plugin --mode=volumes --insecure --hostname myhostname --api-username myusername --api-password "mypassord" --filter-name="NVMFS-01"

OK: Volume 'NVMFS-01' state: online [space usage level: normal], space used: 5.29 TB, read iops: 105, write iops: 429, read latency: 0.857 ms, write latency: 0.494 ms | 'NVMFS-01#volume.space.usage.bytes'=5818315935744B;;;0; 'NVMFS-01#volume.io.read.usage.bytespersecond'=8419805B/s;;;; 'NVMFS-01#volume.io.write.usage.bytespersecond'=7525367B/s;;;0; 'NVMFS-01#volume.io.read.usage.iops'=105iops;;;0; 'NVMFS-01#volume.io.write.usage.iops'=429iops;;;0; 'NVMFS-01#volume.io.read.latency.milliseconds'=0.857ms;;;0; 'NVMFS-01#volume.io.write.latency.milliseconds'=0.494ms;;;0;

And the same query with NVMFS-02: /usr/lib/centreon/plugins/centreon_plugins.pl --plugin=storage::nimble::restapi::plugin --mode=volumes --insecure --hostname myhostname --api-username myusername --api-password "mypassord" --filter-name="NVMFS-02"

OK: Volume 'NVMFS-02' state: online [space usage level: normal], space used: 4.12 TB, read iops: 0, write iops: 0, read latency: 0.000 ms, write latency: 0.000 ms | 'NVMFS-02#volume.space.usage.bytes'=4535013281792B;;;0; 'NVMFS-02#volume.io.read.usage.bytespersecond'=0B/s;;;; 'NVMFS-02#volume.io.write.usage.bytespersecond'=0B/s;;;0; 'NVMFS-02#volume.io.read.usage.iops'=0iops;;;0; 'NVMFS-02#volume.io.write.usage.iops'=0iops;;;0; 'NVMFS-02#volume.io.read.latency.milliseconds'=0.000ms;;;0; 'NVMFS-02#volume.io.write.latency.milliseconds'=0.000ms;;;0;

I think it's maybe a bug because when i do the query with curl on the VMFS-02, i have all the information as on the web interface :

        "vol_name" : "NVMFS-02"
  "avg_stats_last_5mins" : {
     "combined_iops" : 137,
     "combined_latency" : 375,
     "combined_throughput" : 1685540,
     "read_iops" : 46,
     "read_latency" : 92,
     "read_throughput" : 620400,
     "write_iops" : 88,
     "write_latency" : 403,
     "write_throughput" : 1065138

Basically, all our volumes that end with even numbers doesn't have statistics (except space usage) with centreon plugins.

Any help would be greatly appreciated.

Best,

garnier-quentin commented 1 year ago

could you provide the full output with --debug option ?

kernhsab commented 1 year ago

I think i have understood the problem.

All my datastores are replicated i have some actives on nimble1 with replication on nimble2 and vice versa.

On the debug, i see all the volumes finishing with even numbers are checked on the replica.

For example for my volume name SVMFS-04, i have two entries :

The master with some traffic :

        "vol_id" : "06742194feb3f2859e000000000000000000000018",
        "vol_name" : "SVMFS-04"
     }
  ],
  "agent_type" : "none",
  "app_category" : "Virtual Server",
  "app_uuid" : "",
  "avg_stats_last_5mins" : {
     "combined_iops" : 114,
     "combined_latency" : 399,
     "combined_throughput" : 1236475,
     "read_iops" : 37,
     "read_latency" : 105,
     "read_throughput" : 280422,
     "write_iops" : 74,
     "write_latency" : 420,
     "write_throughput" : 956049

And the replicated with no traffic :

         "vol_id" : "06742194feb3f2859e000000000000000000000019",
        "vol_name" : "SVMFS-04"
     }
  ],
  "agent_type" : "none",
  "app_category" : "Virtual Server",
  "app_uuid" : "",
  "avg_stats_last_5mins" : {
     "combined_iops" : 0,
     "combined_latency" : 0,
     "combined_throughput" : 0,
     "read_iops" : 0,
     "read_latency" : 0,
     "read_throughput" : 0,
     "write_iops" : 0,
     "write_latency" : 0,
     "write_throughput" : 0

I just don't understand why for the volumes finishing with odd numbers the script take the masters one and the others the replcated one.

kernhsab commented 1 year ago

Hi,

i've found a way to filter the upstream and no replication volumes only.

in the file storage/nimble/restapi/mode/volumes.pm 'i add a condition inside the manage_collection sub foreach loop :

    if ($_->{replication_role} !~ /synchronous_upstream/ && $_->{replication_role} !~ /no_replication/) {
            $self->{output}->output_add(long_msg => "skipping volume '" . $_->{name} . "': is a downstream volume.", debug => 1);
            next;
    }

or

    if ($_->{replication_role} eq 'synchronous_downstream') {
            $self->{output}->output_add(long_msg => "skipping volume '" . $_->{name} . "': is a downstream volume.", debug => 1);
            next;
    }

This way we take only the upstreams volumes and the ones with no replication too.

Best,

garnier-quentin commented 1 year ago

Could you provide the full output with --debug ? you can send me to my email: qgarnier@centreon.com

garnier-quentin commented 1 year ago

https://github.com/centreon/centreon-plugins/pull/4260