business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
339 stars 61 forks source link

gesd() does not implement the GESD test #34

Open ksvanhorn opened 5 years ago

ksvanhorn commented 5 years ago

The documentation for anomalize::gesd() states that it implements the GESD method, and references @raunakms's gesd() function. But whereas the GESD method and @raunakms's gesd() function compute the test statistic R_i as

|x_i - mean(x)| / sd(x)

anomalize::gesd() uses

|x_i - median(x)| / mad(x)

Whatever the pros and cons of this modification, the result is NOT the GESD method, and is NOT the same as @raunakms's gesd().

mdancho84 commented 5 years ago

If you investigate Twitter's GESD method, the implementation that is used is what anomalize uses. The rationale is that the combination of twitter's trend removal (time_decompose(method = "twitter") and anomalize(method = "gesd") should produce a scalable version of Twitter's AnomalyDetection R package.

mdancho84 commented 5 years ago

The Twitter AnomalyDetection algorithm has fairly good documentation here: https://arxiv.org/pdf/1704.07706.pdf