business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
338 stars 60 forks source link

Sign error in limits of GESD method #46

Closed frseguin closed 4 years ago

frseguin commented 4 years ago

I believe there is a sign error in the GESD method in anomalize.methods.R. https://github.com/business-science/anomalize/blob/2d6ba491d604bb2a5c4076ac597084a38a216fbf/R/anomalize_methods.R#L188

Specifically, the issue is concerning the derivation of the upper bound derived from the critical value from the GESD method. These bounds are stored in the outlier_report element of the output when using verbose = TRUE in the gesd function. The bounds are derived from the equation

\frac{\left|x_m - \mathrm{median}(X)\right|}{\mathrm{MAD}} \leq \lambda

which after rearranging gives

\mathrm{median}(X) - \lambda \cdot \mathrm{MAD}  \leq x_m \leq \lambda \cdot \mathrm{MAD} + \mathrm{median}(X)

In particular, the right hand side is \lambda \cdot \mathrm{MAD} + \mathrm{median}(X) and not \lambda \cdot \mathrm{MAD} - \mathrm{median}(X)

I believe changing line 188 to

limit_upper = critical_value * mad + median

would fix the issue.

technocrat commented 4 years ago

I concur; once stated that way it's algebraically obvious that $lower_limit = critical_value = upper_limit$, which your proposed fix corrects.

mdancho84 commented 4 years ago

Thanks both @frseguin & @technocrat. I will update the package shortly.

mdancho84 commented 4 years ago

Should be fixed now in the development version. Can you check to make sure it works for you?

frseguin commented 4 years ago

To be clear @technocrat the lower bound was actually fine. The issue was that $upper_limit = -$lower_limit, which happens to work in the case where the median is zero. This is why the issue was not detected in the example provided in the vignette where the data is normal(0,1).

The changes of commit bf158eba66ca3f8dbb912b8beb425fa06e6ad262 does fix the issue completely, and I have confirmed that everything is working as far as I am concerned.

Thank you @mdancho84 for the fast reply!