Closed arthemis911222 closed 3 years ago
Hi @arthemis911222
I do not understand how you created seismic-new.csv so I am just using the normal seismic.csv to explain why side="negative"
does not detect anomalies.
The reason that it will not detect any anomalies when set to side="negative"
is because internally the algorithms in use do not resolve any anomalies due to the nature of the data.
In this specific case VolatilityShiftAD
is determined by the product of the iqr_ad
and sign_check
inputs.
"and": {
"model": AndAggregator(),
"input": ["iqr_ad", "sign_check"],
},
The model AndAggregator()
identifies a time point as anomalous only if it is included in all the input anomaly lists.
The important part is only if it is included in all the input anomaly lists, meaning it must equal 1 in both iqr_ad
and sign_check
.
We can run through each of these steps and see why it does not detect on negative only.
So from the beginning.
import pandas as pd
from adtk.data import validate_series
from adtk.visualization import plot
from adtk.detector import VolatilityShiftAD
csv_file = '/tmp/adtk/docs/notebooks/data/seismic.csv'
s = pd.read_csv(csv_file, index_col="Time", parse_dates=True, squeeze=True)
s = validate_series(s)
volatility_shift_ad = VolatilityShiftAD(c=6.0, side='both', window=20)
anomalies = volatility_shift_ad.fit_detect(s)
plot(s, anomaly=anomalies, anomaly_color='red')
As expected a volatiliy shift was detected.
Let us now demonstrate the issue you are describing.
s = pd.read_csv(csv_file, index_col="Time", parse_dates=True, squeeze=True)
s = validate_series(s)
volatility_shift_ad = VolatilityShiftAD(c=6.0, side='negative', window=20)
anomalies = volatility_shift_ad.fit_detect(s)
plot(s, anomaly=anomalies, anomaly_color='red')
As you said no volatiliy shift detected. You would naturally expect that it would detect one where the troughs reach the -50 and -100 range, but that is not the case.
If we look at the methods and data, it is possible to see why there is no anomaly detected.
Let us start with side="positive"
pipnet = VolatilityShiftAD(window=20, side="positive")
s = pd.read_csv(csv_file, index_col="Time", parse_dates=True, squeeze=True)
s = validate_series(s)
anomalies = pipnet.pipe_.fit_detect(s,return_intermediate=True)
Now first let us look at the iqr_ad
values which is calculated from InterQuartileRangeAD
using the diff_abs
as the input data (diff_abs
being a result of the series being run through DoubleRollingAggregate
)
plot(anomalies["diff_abs"])
And diff_abs
is used to calculate iqr_ad
values
plot(anomalies["iqr_ad"])
As we can see around the anomaly region the value is 1
Now let us look at sign_check
which is calculated from ThresholdAD
using the diff
as the input data (diff
being a result of the series being run through DoubleRollingAggregate
)
plot(anomalies["diff"])
And now the sign_check
result
plot(anomalies["sign_check"])
All 1s.
Now lets plot the iqr_ad
and sign_check
together.
iqr_ad = anomalies['iqr_ad']
sign_check = anomalies['sign_check']
data = {'iqr_ad': iqr_ad, 'sign_check': sign_check}
df_to_plot = pd.DataFrame(data)
df_to_plot.plot(figsize=(18, 6))
You can see that they are both equal to 1 where the anomaly was detected.
Now doing the same with side=negative
pipnet = VolatilityShiftAD(window=20, side="negative")
s = pd.read_csv(csv_file, index_col="Time", parse_dates=True, squeeze=True)
s = validate_series(s)
anomalies = pipnet.pipe_.fit_detect(s,return_intermediate=True)
Here diff_abs
, iqr_ad
and diff
are the same as positive but sign_check
is different.
plot(anomalies["sign_check"])
If we plot the iqr_ad
and sign_check
together we can see that there is no point where they both equal 1.
iqr_ad = anomalies['iqr_ad']
sign_check = anomalies['sign_check']
data = {'iqr_ad': iqr_ad, 'sign_check': sign_check}
df_to_plot = pd.DataFrame(data)
df_to_plot.plot(figsize=(18, 6))
The algorithm/s are working as expected, it just so happens that the ensemble of algorithms does not concur that there is anomalous negative changes because the historical interquartile range defines the volatility shift happening before the point where you might expect anomalous negative changes to have triggered, by the time the troughs drop down to the -50 and -100 range, the historical interquartile range has dropped back down to 0, so no matter how volatile the diff
/sign_check
may be, in terms of the definitions to the VolatilityShiftAD
it is not anomalous.
I hope this explains issue for you. With regards to your Excel question, I have no comment.
Thanks for you answer,but I also have some question. @earthgecko
VolatilityShiftAD can detects the anomaly that the ‘std’ values of left sliding window and right sliding window have large different, no matter the time series is from ‘smoothly’(std value is small) to ‘roughly’(std value is big) or ‘roughly’ to ‘smoothly’, is that? Maybe like the LevelShiftAD.
VolatilityShiftAD detects shift of volatility level by tracking the difference between standard deviations at two sliding time windows next to each other.
In my realization, smoothly to roughly is like the anomaly in 'seismic.csv'. And I create the anomaly(roughly to smoothly) by copy the data to the end of the 'seismic.csv'. The std value of the copy data is small. The std value of the data which left near the copy data is big. (Maybe it is not a good way) Such as:
At the created anomaly, the std values of the left window and right window are also large different, but VolatilityShiftAD can not detect it, why?
I want to find the answer, so I read the code about VolatilityShiftAD and print the 'diff_abs' figure:
agg="std"
self.pipe_ = Pipenet(
{
"diff_abs": {
"model": DoubleRollingAggregate(
agg=agg,
window=window,
center=True,
min_periods=min_periods,
diff="abs_rel_diff",
),
...
In my realization, the std values of two windows near the created anomaly is large different, so it will obviously show in “diff_abs”. But not. Why? Look forward to you reply, thanks!
@arthemis911222 thanks for the description of how you created your data set, I have reproduced it and it is being in the following explanation.
The issue you are experiencing with your method is probably because your method is not calculating the diff in the same way the DoubleRollingAggregate
does. I suspect that your method is just calculating diff
.
Step by step what the different diff methods result in with your created anomaly time series.
from adtk.transformer import DoubleRollingAggregate, RollingAggregate
s_copy = s.copy()
s_transformed = DoubleRollingAggregate(
agg='std',
window=window,
center=True,
min_periods=None,
diff="abs_rel_diff").transform(s_copy).rename("Diff double rolling std (mm)")
plot(pd.concat([s_copy, s_transformed], axis=1))
This is what VolatilityShiftAD
is calculating be default ^^
The DoubleRollingAggregate
steps are
s_copy = s.copy()
s_rolling_left = RollingAggregate(
agg='std',
window=window,
center=False,
min_periods=None).transform(s_copy.shift(1)).rename("rolling left - std (mm)")
plot(s_rolling_left)
rolling left window ^^
s_rolling_right = pd.Series(
RollingAggregate(
agg='std',
window=20,
center=False,
)
.transform(s_copy.iloc[::-1])
.iloc[::-1]
)
s_rolling_right.name = "rolling right - std (mm)"
plot(s_rolling_right)
rolling right window ^^
Now the part which you may not be replicating in your method is using the diff_abs
I suspect.
We can now look at the results of the different diff methods.
diff_abs = abs(s_rolling_right - s_rolling_left) / s_rolling_left
plot(diff_abs, title='diff_abs')
diff = s_rolling_right - s_rolling_left
plot(diff, title='diff')
If you want to reproduce the results ensure that you use abs(s_rolling_right - s_rolling_left) / s_rolling_left
to calculate your diff.
I hope this explains why, good luck.
And even if the VolatilityShiftAD
class was modified to use diff
instead of diff_abs
the outcome would probably not be what you assume it would be.
"diff_abs": {
"model": DoubleRollingAggregate(
agg=agg,
window=window,
center=True,
min_periods=min_periods,
diff="diff",
),
"input": "original",
},
I get it! Because the std value of left window at first anomaly is small while it at second anomaly is big. And they impact the result. thanks a lot! @earthgecko
I use VolatilityShiftAD and set 'side=both/positive/negative' to see how different about them, but the results of them are exactly same. VolatilityShiftAD can not detect the negative anomaly.
('seismic-new.csv': copy the nomaly data which from 'seismic.csv' to the end of file)
anomalies:
diff_abs(std):
I use Excel to calculate std at two time '08-05 15:05:00'(left window < 15:05:00, right window > 15:05:00) and '08-05 16:56:00', the result is different from the VolatilityShiftAD.