Open ghost opened 8 years ago
The reason of 55 points instead of 56 is that the R implementation checks DickeyFuller stationary test and has an extra of step of making the time series stationary (first do difference and then drop one point). However, I think the R version of DickeyFull test has a bug, see my comments here https://github.com/Netflix/Surus/issues/14. If you turn off this test for both R and Java, you should get consistent results.
I have a time series (56 observations) like this:
When I run with R:
library(RAD)
data = c(3.197097, 3.029077, 3.005744, 2.969745, 2.988609, 2.97782, 2.933626, 3.185347, 3.241275, 3.117891, 3.071268, 3.118897, 3.152572, 3.232348, 3.424237, 3.323964, 3.302709, 3.341312, 3.341527, 3.375134, 3.543823, 3.879864, 3.420371, 3.294217, 3.49587, 3.521571, 3.599039, 3.925218, 3.99248, 3.689928, 3.749015, 3.583267, 3.704804, 3.742834, 3.599793, 3.699821, 3.630572, 3.684399, 3.725435, 3.743818, 3.744296, 3.667758, 3.899343, 3.724631, 3.551779, 3.557395, 3.748661, 3.569791, 3.520395, 3.529122, 3.604996, 3.623308, 3.586358, 3.793575, 3.837355, 3.753702)
a=AnomalyDetection.rpca(data, frequency = 7)
S_matrix=a$S_transform
View(data.frame(S_matrix))
It returns a vector, with the length is 55 (less 1 than the number of the data):
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.130887399206267, 0, 0, 0, 0, 0.00318375301259443, 0, -0.0885939624397428, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.00646893256411638, 0, 0)
It says that we got 4 anomaly points.
When It comes to Java:
I just adjust the input in RAD_test file.
double[] ts = new double[] { 3.197097, 3.029077, 3.005744, 2.969745, 2.988609, 2.97782, 2.933626, 3.185347, 3.241275, 3.117891, 3.071268, 3.118897, 3.152572, 3.232348, 3.424237, 3.323964, 3.302709, 3.341312, 3.341527, 3.375134, 3.543823, 3.879864, 3.420371, 3.294217, 3.49587, 3.521571, 3.599039, 3.925218, 3.99248, 3.689928, 3.749015, 3.583267, 3.704804, 3.742834, 3.599793, 3.699821, 3.630572, 3.684399, 3.725435, 3.743818, 3.744296, 3.667758, 3.899343, 3.724631, 3.551779, 3.557395, 3.748661, 3.569791, 3.520395, 3.529122, 3.604996, 3.623308, 3.586358, 3.793575, 3.837355, 3.753702};
I got the S_matrix from observed as below:
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.129439154052943, 0, -0.0123981691519606, 0, 0, 0, 0.168267483707591, 0.156119414262503, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.104831613422113, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
In this case, we had 5 anomaly points.as you can see, the Java version returns a vector much different from R version. Beside the length (55 of R, to compare with 56 of Java, It doesnt matter), the values are a big deal. With each version, I get a brand new result (4 versus 5 anomaly points). It makes me so confused.
I hope you can help me out.
Thank you so much.