business-science / anomalize

Tidy anomaly detection
https://business-science.github.io/anomalize/
339 stars 61 forks source link

gesd marks second smallest value as an outlier, but not the smallest value #15

Open tmaravina opened 6 years ago

tmaravina commented 6 years ago

'library(anomalize) tmp <- c(5.458, 5.515, 5.504, 5.358, 5.522, 5.398, 5.531, 5.439, 5.348, 5.538) cbind(tmp, gesd(tmp, alpha=0.05, max_anoms=0.2))'

[1,] "5.458" "No" [2,] "5.515" "No" [3,] "5.504" "No" [4,] "5.358" "Yes" [5,] "5.522" "No" [6,] "5.398" "No" [7,] "5.531" "No" [8,] "5.439" "No" [9,] "5.348" "No" [10,] "5.538" "No"

Counter-intuitive output: observation #4 that is marked as an outlier is not even one of the extremes (observation #9 is smaller).

'gesd(tmp, alpha=0.05, max_anoms=0.2, verbose = TRUE)$outlier_report'

A tibble: 2 x 7 rank index value limit_lower limit_upper outlier direction

1 1.00 9.00 5.35 5.32 -5.32 No NA 2 2.00 4.00 5.36 5.39 -5.39 Yes Up Shouldn't the above suggest that there are 2 outliers: not only observation #4 (the second smallest value), but also all preceding candidates, namely observation #9 (the actual minimum)?