Closed filipwastberg closed 4 years ago
The function model.predict(t, x)
expects t
to be a datetime and x
to be a numpy array representing a feature-vector (i.e. data-point).
From what I can see, the index of your dataframe is an integer, therefore, at each iteration of the loop for t, x in zip(df.index, df.values): ...
the value of your t
is an integer (and not a datetime as expected) and your data-point x
has a timestamp included in it (while it is expected to be a feature-vector, without time).
A simple way to change your code is to just define the timestamp column as an index. Here is a working code (with comments added where I changed something) :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
from grand import IndividualAnomalyInductive, IndividualAnomalyTransductive, GroupAnomaly
df = pd.read_csv("simulated_data.csv", parse_dates=True, header = 0, index_col=0) # Added index_col=0
df.index = pd.to_datetime(df.index) # The timestamps are now our index column
df.columns = ["value"] # the columns (features) excluding the index
df.plot() # there is no column "timestamp" now, it's the index
plt.show()
model = IndividualAnomalyTransductive(ref_group = ["day-of-week"], w_martingale = 100)
# You can also try with "season-of-year" as the periodicity in your data seems seasonal
# model = IndividualAnomalyTransductive(ref_group = ["season-of-year"], w_martingale = 100)
for t, x in zip(df.index, df.values):
info = model.predict(t, x)
print("Time: {} ==> strangeness: {}, deviation: {}".format(t, info.strangeness, info.deviation), end="\r")
# Just added this line to see the results
model.plot_deviations(figsize=(12, 8), plots=["data", "strangeness", "deviation", "pvalue", "threshold"])
Regarding your second question, the expected input of IndividualAnomalyTransductive()
is as described in the example Notebook:
model = IndividualAnomalyTransductive(
ref_group = ["day-of-week"] # Criteria to use to construct reference data (check the notebook examples to see other possible criteria to use).
external_percentage = 0.3 # Percentage of samples to pick from historical data in the case where ref_group is set to "external".
# The following parameters are the same as in IndividualAnomalyInductive
non_conformity = "knn", # Strangeness measure, e.g. "knn" or "median"
k = 20 # Used if non_conformity is "knn"
w_martingale = 15, # Window size used for computing the deviation level
dev_threshold = 0.6, # Threshold on the deviation level (in [0, 1])
columns=None # Optional feature names (for interpreting the results)
)
There is no other documentation for the moment besides the explanations given on the example Notebook. However, it will come in near future.
That's great. Thanks. I really think that some documentation of the functions would be a great feature.
Furthermore, I think it would be great if we could be able to install the package with pip install git+https://github.com/caisr-hh/group-anomaly-detection
, instead of having to clone the whole project and then installing it. Is that something you are considering?
I succesfully installed your package and manage to run through the example. However, when trying with simulated data I get an error message.
And the error message:
What is the expected input in
IndividualAnomalyTransductive()
and is there any specific documentation besides the example Notebook?