IBM / spectrum-protect-sppmon

Monitoring and long-term reporting for IBM Spectrum Protect Plus. Provides a data bridge from SPP to InfluxDB and provides visualization dashboards via Grafana.
Apache License 2.0
13 stars 8 forks source link

parsing exception for ssh results when a vSNAP server is not online #84

Closed baslerj closed 2 years ago

baslerj commented 2 years ago

The following parsing error was observed: Exception in FILE: influx_client.py, Line: 646, Exception: <class 'influxdb.exceptions.InfluxDBClientError'> Exception Message: 400: {"error":"partial write: unable to parse 'vsnap_pools,encryption_enabled=False,id=1,name=primary,pool_type=raid0,status=Unknown,hostName=x.x.x.x,ssh_type=VSNAP compression_ratio=1.00x,deduplication_ratio=1.00x,diskgroup_size=1i,health=0i 1641493989': invalid number dropped=0"} Some messages were lost when sending buffer for table vsnap_pools, but everything else should be OK Storing script metrics total of 7 exception/s occured

Running vsnap pool show on the remote system indicates there is a problem with the vsnap pool. I recommend matching cases where state != "ONLINE" and making an update the will reflect the error condition in a vsnap status report.

` sudo vsnap pool show TOTAL: 1

ID: 1 NAME: primary POOL TYPE: raid0 STATUS: Unknown HEALTH: No `

NielsKorschinsky commented 2 years ago

@baslerj It seems like the issue is that the compression ratio and dedup-ratio includes now a x, instead of beeing a pure float.

A few important questions:

If the values are fine, I would only trim out the x to allow processing of these stats again. If the 1.00 values are placeholders I will cut out any invalid datapoints.

NielsKorschinsky commented 2 years ago

@baslerj tried to recreate this issue with a DEGRADED state, but it seems like this issue is either fixed or very depended on the UNKNOWN status. Due to PR #87 the vsnap-API will most likely only be called when it is full/mostly available, preventing a UNKNOWN state. If this error occurs again a hotfix can be deployed, implementing it now would only bloat the code unnecessary.