NetManAIOps / Bagel

IPCCC 2018: Robust and Unsupervised KPI Anomaly Detection Based on Conditional Variational Autoencoder
50 stars 19 forks source link

Handling missing data #4

Closed GSav90 closed 3 years ago

GSav90 commented 4 years ago

Hi guys,

Thanks for creating this package. I see this package works good on datasets with no missing values but one of the dataset I was testing had missing timestamps. My data is KPI data i.e Sales every 5 minute in a store.

Your KPI class has a property called missing value which is causing my raw_data to be negative. For example, if a data point of 19-June-2020 3:35:00 pm is missing then this value is getting replaced by some negative number. And the negative number is the result of this function @property def missing_value(self): return self.value[self.missing == 1][0] if np.count_nonzero(self.missing) > 0 else 2 * np.min(self.value) -np.max(self.value)

this property is causing my raw data to be negative when in reality my sales can never be negative.

  1. Could you tell me what's the purpose of this function?
  2. In order to fix this issue, should I pass an array when calling KPISeries class to be assigned to missing parameter(which by default is None)?
    • If yes, what does that array contain? Should it be values or timestamps of missing points?

Let me know how you've designed it. I am sure I am missing something here Appreciate your help!

lizeyan commented 3 years ago

This function reports what value is imputed for missings. If there are missing, then it returns the actually imputed values. If there is not any missing, then it just returns a very low value.

lizeyan commented 3 years ago

You can manually impute your time series with zero before constructing a KPISeries.