IBM / spectrum-protect-sppmon

Monitoring and long-term reporting for IBM Spectrum Protect Plus. Provides a data bridge from SPP to InfluxDB and provides visualization dashboards via Grafana.
Apache License 2.0
13 stars 8 forks source link

SPPCheck: Interpolate multiple missing values #116

Open NielsKorschinsky opened 1 year ago

NielsKorschinsky commented 1 year ago

Current state:

Only a maximum of one data point is interpolated when preparing the data. This has the reasoning that an assumption of multiple datapoints can have a negative effect when summarizing, especially with counting values. this is because a zero-count is not permitted and internally handled as NA, interpolating would bloat up the summarized count. However - how realistic are zero-counts in a production system?

The major issue with not interpolating these intermediate values is, that they can greatly affect the summarized prediction. If a single multiple systems fail, the trend for this period is lower than before and after, showing the prediction function there was an increase/decrease in the trend. It might be wise to interpolate any point, as long as there is a following data point (or multiple? hard to implement).

The individual prediction is not affected, as they are interpolated as a safety measurement before, resulting in a error spam in the log messages (note #113 )

latest_total_summarized_storage_data

Except of Masters thesis:

Due to an unknown reason, some of the vSnaps associated with the testing system were unreachable. 
Therefore, no data could be collected by SPPMon and the data points are missing. 
According to the guidelines of the predictor,when preparing data, a maximum of one data point can be interpolated in a row (see Section4.2.3). 
This guideline results in missing data from four of seven vSnaps from 04.05.2022 until 07.06.2022. 
These missing values influence the total trend, though it is negligible in the long-term trend.
The missing values are not directly apparent in Grafana because it automatically connects the remaining data points, though when inspecting the data, the missing values become apparent.